Using AI in peer review and publishing

Large Language Models (LLMs) and other AI models in Peer Review and Publishing

The opportunities and threats of using LLMs and AI in research writing

At Sage, we recognize and champion new technology that facilitates conducting, writing and disseminating research. A multitude of tools and technology have been developed in recent years that can increase productivity and aid those perhaps writing in English as a second or third language.

We also recognize that the increasingly widespread use of generative AI/ LLMs blurs the lines between human generated and machine generated text, calling into question some of our usual assumptions and policies on authorship. This technology may be used by bad actors to create fabricated submissions and attempt to subvert the peer-review process for commercial gain. The currently available language models are not fully objective or factual. Authors using generative AI to write their research must make every effort to ensure that the output is factually correct, and the references provided reflect the claims made.

We’ve put together this guide for Editors around the use of LLMs and AI in scholarly publishing. As technology improves and we adapt to using these tools, we will likely develop this guidance further. Further resources that might be useful are listed at the bottom of the page. You could also have a look at our Sage Campus course, Introduction to Artificial Intelligence

If you have any questions or concerns about AI tools/LLMs do contact your Sage Editor in the first instance.

Sage policy on the use of LLMs in submissions

Potential biases in generative AI

The use of LLMs or AI tools in peer review

Combatting the use of AI to generate spurious content

Detecting Generative AI and LLMs

Further resources

Sage policy on the use of LLMs in submissions

Our policy on the use of LLMs in submissions can be found on our Publishing Policies pages: ChatGPT and Generative AI

Potential biases in generative AI

The information being fed to the tool is curated by a human presence
Some of these AI tools will be limited to using freely available resources and therefore based on a partial selection of the literature
AI tools and LLMs are trained on sources that contain systemic biases and are likely to inadvertently replicate those biases in their outputs
These tools may curate responses from multiple sources available on the internet which reduces accuracy and increases the chance of spurious information

Potentially false or misleading information and references

LLMs, like ChatGPT, have been seen to falsify references or insert incorrect ones within their essays or summaries. We have seen several instances where citations to publications that do not exist were provided by ChatGPT.

In addition, AI tools or LLMs may generate outputs that appear or sound plausible but cannot be validated or cannot be justified by its source data. This phenomenon, referred to as Hallucinations, can occur either directly from the source data or from the way the model is trained. Frequently occurring hallucinations are a considerable problem with this technology.

Image, data, and text fabrication in submissions

LLMs may be exploited by suspicious or bad actors to generate fabricated text, or text put together from various sources on the internet. While this may be appropriate for summarizing complex information for further study, it remains inappropriate for primary research articles that must contain critical new information, either with a new perspective or containing novel data. LLMs in primary research may only be detected using AI detection tools, but these tools cannot currently detect falsified references in text.

In addition, there are growing concerns that these tools may be used to generate images that are reported as primary data but were generated using AI. Before incorporating sources into your scholarly work, apply the CRAAP test to your responses to avoid sending out misinformation or spreading bias.

The use of LLMs or AI tools in peer review

Use of LLMs for editors

The use of AI or LLMs for Editorial work presents confidentiality and copyright issues. As the tool or model will learn from what it receives over time and may use it to provide outputs to others, we ask Editors not to use these tools to triage manuscripts or create summaries. You should also not use these tools to summarize reviews and write decision letters due to concerns around confidentiality and copyright.
You could use ChatGPT or other AI based tools to look for reviewers in the subject area. Due to concerns around spurious text generation, we ask that you verify their identity before inviting them to review a submission. Reviewer verification should typically involve checking their publication record and/or institutional profiles using a basic google/internet search.

Use of LLMs for reviewers

While LLMs can create a critical summary that would look like a review report, it is unlikely to be able to capture the reviewer’s experience as a researcher in the field, any local or contextual nuances of the study or indeed what impact the study may have on various populations. We ask that Editors ensure the reviewers invited are aware of the confidentiality issues presented by generating a review report using language models or generative AI. If an Editor is concerned about a review report that appears to be generated by ChatGPT or another tool, they should flag this to Sage for advice.

Combatting the use of AI to generate spurious content

Vetted experts as reviewers

It continues to be important to use vetted reviewers who are experts in the field. Using reviewers who do not have specific expertise or those who cannot be verified increases the risk of machine written content to pass peer-review and masquerade as genuine human writing.

Reading the submission

Careful reading of text is crucial to understand if a submission was written by generative AI. As Editors, we rely on your subject level expertise to discern whether an article makes sense at the sentence level but also at the overall document level. If a sentence or paragraph does not make sense, or appears to be machine generated, please query it with authors, or raise it with Sage for advice. We recommend looking out for usual ChatGPT prompts such as “Regenerate response” in the text.

Qualitative indicators:

Complexity of paragraph structure- humans are likely to have more sentences per paragraph in academic writing
Diversity in sentence structures, length- humans are likely to have more words per paragraph and increased variability in length of consecutive sentences. For example, a short sentence followed by a very long sentence.
Usage of punctuation unique to human academic writing- brackets, semi colons and colons
Usage of equivocal words like “Although” or “However” or “but” and “because” are more commonly associated with human writing.

Detecting Generative AI and LLMs

This is a constantly evolving landscape as LLMs are evolving fast and work into developing appropriate detection methods has been perceived as an arms race. We have identified some free tools that exist outside our submission system which will allow us to deepen our understanding of the AI generated content in our submissions and determine whether any currently available tools would help detection. We are undertaking a pilot on some journals to understand which tools may be useful for detection.

NB: many of the key differentiating traits between text generated by humans and AI-generated text—including the use of colloquial and emotional language—are not traits that academic scientists typically display in formal writing, so any differences or anomalies in this respect would not necessarily translate to academic writing.