LLMs & AI tools

Artificial intelligent (AI) driven language processing tools are becoming more relevant in the scientific writing world, with new tools emerging all the time. Their widespread availability and usage are causing both hopes and fears regarding their long-term impact on publishing, authorship, and scientific writing.

This page tries to give an overview of the most prevalent questions in the form of a living document. Please note that it is neither complete, nor an official position of Lib4RI or the four RIs.

Generative AI tools

  • ChatGPT is an AI language model developed by OpenAI. It is part of the GPT series, with GPT-4o being the latest version. It is designed to understand and generate natural language responses, and it is thus capable of engaging in conversations, answering questions, and providing information or assistance across a wide range of topics by following instructions from prompts by the user.

    The model is trained on a diverse range of internet text, but it is important to keep in mind that it does not automatically browse the internet or access real-time information. This means that while it can provide information that is current up to its last training data in April 2023, it may not have the latest information on events or developments that occurred after that date. An option to use ChatGPT-4 with internet access for free, is Microsoft Copilot (formerly Bing Chat).

  • While the full functionalities of ChatGPT-4 (including Dall-E, plugins, custom GPTs etc.) can only be used via their subscription plan, there are several GPT-4 based services that you can use for free:

  • The following list is of course incomplete, it is just a selection of the most well known ones:

    • Claude 3.5 Sonnet: Released in June 2024, Claude 3.5 Sonnet is Anthropic's newest generative AI chatbot version. It is free to use up to a certain limit and sharing user data is opt-in (instead of opt-out with e.g. Chat-GPT). It's training data cutoff date is April 2024; it does not have access to the internet. With the new "Artifacts" feature it is possible to create code, documents, webdesigns etc. that are displayed in a side window, making it easy to build on and change the creations in real time.
    • Microsoft Copilot: Microsoft’s free for use chatbot, formerly Bing Chat, has internet connection and is able to cite sources, generate images, and write lyrics and music via plugin. It is built on GPT-4 and best in creative or precise mode.
    • Gemini: Google’s chatbot (formerly known as Bard) has internet connection and integration in Google apps. A free version, as well as a subscription “Pro” / “Advanced” version are available. It is built on GPT-4 and is able to generate text, images, code etc.
    • Perplexity AI: A conversational search engine that uses natural language predictive text to answer queries. While its conversational skills are less sophisticated than ChatGPT-4’s, it is able to use sources from the web and reference them. It has access to GPT-4, Claude 2, and Gemini Pro.
    • GPT4All: GPT4All, created by Nomic AI, is an ecosystem to train and deploy customised LLMs that run locally on consumer grade CPUs without internet connection or a GPU. They are free and open source and can be used to integrate LLMs into applications without a subscription. Your data and chats are only saved on your local hardware, unless you intentionally share it with GPT4All to help grow their models.
    • TalkAI: TalkAI is a virtual assistant / AI powered chatbot that can directly be integrated into WhatsApp and Telegram. It is built on GPT-3.5 and offers voice response, commands, and translation.
    • Mistral AI: This Open Source AI model is free for everyone to use and modify and is developed by a French startup of the same name. With some know-how, the models are adjustable.
    • LLaMA 2: A family of pre-trained, fine-tuned Open Source LLMs developed by Meta. They are free of charge for research and commercial use and can be downloaded in three different model sizes. With some know-how, the models are adjustable.
  • Yes, you can choose not to share your chat history with OpenAI to train their models via your account settings. For more information, see here.

AI tools in scientific writing

  • The use of generative AI tools in scientific writing poses a number of risks and it is crucial to be mindful of them. Here is a selection of some points to consider:

    • Misinformation and hallucinations: Generative AI works statistically and it is not able to judge whether something is true / untrue or ethical / unethical. It is therefore possible that the tools produce false or inaccurate information, make up references etc. all while sounding very confident and with a factual tone. Always verify outputs and check with reliable sources.
    • Unidentifiable sources: Many AI tools do not cite their sources and it is very probable that they mix together content that they have encountered before. It is also unclear on which specific data the model was trained.
    • Bias: Outputs are not neutral, they have a certain bias due to their training data and their algorithm and programming. Since they need to make certain classifications and assumptions for efficiency reasons, they tend to reinforce stereotypes and marginalise minority perspectives (see e.g. Feng et al. 2023).
    • Interference with learning, agency and critical thinking: Generative AI tools readily offer “nice” sounding answers; they can provide a solution without highlighting the process behind it. An over-reliance on these tools can therefore interfere with the learning process of discovery and independent research. Furthermore, relying too heavily on these tools can reduce one’s own agency in decision-making during the process and critical thinking regarding which elements should be considered or included.
    • Time costs: AI-based tools promise an increased efficiency. However, getting to know these tools and especially creating effective prompts that generate the desired output are very time consuming. It is possible that the time saved for writing and doing independent research could just be spent for prompting and verifying the AI’s output. Moreover, if these steps are not taken, the increased efficiency of using generative AI tools may come at the cost of attention to detail, precision and depth.
    • Writing style: The use of generative AI poses different risks for the writing style: one’s personal style might be lost, creativity is limited, there might be an inconsistency of styles if different tools are used or simply because this can happen, context specific vocabulary might not be known by the tool, the tone might be off, empathy could be missing, etc. 
    • Plagiarism and academic integrity: While the use of AI-based tools does not constitute plagiarism, you need to make sure you comply with all existing guidelines and laws (e.g. department or institute guidelines, research integrity guidelines, citation etiquette, declaration of originality, journal or publisher guidelines, copyright laws, etc.). It is good practice to transparently document the use of AI-based tools in a scientific work. Check, if and how you should declare the use of AI-based tools and adhere to the specific rules of your citation. Furthermore, be mindful that any text that you feed into AI-based tools can potentially be reused for training; if your text contains any sensitive or confidential information and data, do not input it into such tools.

Further reading & general information

  • AbbreviationMeaning
    AI detectorA tool designed to detect when a text was AI-generated (highly unreliable)
    AlgorithmA finite sequence of instructions followed by a computer system
    AlignmentThe extent to which an AI’s goals are in line with its creators’ goals
    AnthropomorphismThe attribution of human traits to non-human entities
    Artificial general intelligence (AGI)AI that surpasses human intelligence
    Artificial intelligence (AI)Intelligence demonstrated by machines
    AutomationHandling a process with machines or software so that less human input is needed
    AutonomousIf a machine is able to perform tasks without human input
    BiasAssumptions that an AI makes to simplify its tasks stemming from its training and training data
    Big dataVery large datasets that normal data-processing software cannot handle
    CAPTCHAA test used online to ensure that the user is human
    ChatbotA software application that mimics human conversation, usually through text
    Chinese roomA philosophical thought experiment about AI
    Deep learning (DL)A form of machine learning based on neural networks
    DeepfakeAI-generated images and videos designed to look real
    Emergent behaviorComplex behavior resulting from basic processes
    Generative AIAI systems that generate output in response to prompts
    Generative pre-trained transformer (GPT)A type of LLM used in ChatGPT and other AI applications
    HallucinationA tendency of AI chatbots to confidently present false information
    Large language model (LLM)A neural net trained on large amounts of text to imitate human language
    Machine learning (ML)The study of how AI acquires knowledge from training data
    Machine translationUse of software to translate text between languages
    Natural language processing (NLP)The study of interaction between computers and human language
    Neural net(work)Computer systems designed to mimic brain structures
    ParameterA variable in an AI system that it uses to make predictions
    PerplexityA measurement of how unpredictable a text is
    PromptThe input from the user to which the AI system responds
    Reinforcement learning from human feedback (RLHF)A training method used to fine-tune GPT model responses
    TemperatureThe level of randomness in an LLM’s output
    TokenThe basic unit of text (a word or part of a word) processed by LLMs
    Training dataThe dataset that was used to train an AI system
    Turing testA test of a machine’s ability to display human intelligence
  • The discussion about usage of generative AI tools in scientific writing have greatly increased; regulations are being defined and its advantages and limitations evaluated. Below, you can find a selection of interesting articles on the topic:

  • As institutions and publishers have to face the fact that authors may use AI-based tools to (help) write scientific articles, they begin to set up guidelines, best practices and regulations regarding its usage. Here are some examples of information from publishers and Swiss universities:

    • American Geophysical Union AGU (Report on use of AI in Earth, Space and Environmental Sciences)
    • Bern University (blog about ChatGPT)
    • Bern University (FAQs and official guidelines)
    • Elsevier (FAQ on ChatGPT usage)
    • ETH Zürich (information about ChatGPT)
    • ETH Zürich (information about AI and academic integrity - meant as a conversation, not an official regulation)
    • European Commission: Living guidelines on the responsible use of generative AI in research
    • PSI (position paper on AI driven language processing tools)
    • Science (does not allow the usage of AI generated content)
    • SNSF (news on "The SNSF’s approach to the use of artificial intelligence in funding applications")
    • Springer Nature (allows the usage of AI generated content which is transparent, but not adding the tool as a co-author)
    • STM (White Paper - Generative AI in Scholarly Communications: Ethical and Practical Guidelines for the Use of Generative AI in the Publication Process)
    • Taylor & Francis (allows the usage of AI generated content which is transparent, but not adding the tool as a co-author)
    • University of Geneva (article about ChatGPT)
    • Wiley (allows the usage of AI generated content which is transparent, but not adding the tool as a co-author)
    • ZHAW (blog including official guidelines for the use of generative AI systems)