【Text by Observer Net, Wang Yi】On September 23, the website of the British journal "Nature" reported that a study found generative artificial intelligence (AI) tools are being used to "rework" research papers, producing seemingly new "cloned" studies that have successfully been mixed into academic journals.
A study published on September 12 on the medical preprint platform medRxiv shows that researchers analyzed 112 journals published over the past four and a half years and found that more than 400 papers in these journals were reworked by AI, and they have confirmed that they can bypass the anti-plagiarism detection systems of publishers.
The authors of the study pointed out that this trend is likely driven by individual opportunists, or even commercial companies that produce and sell paper authorship, known as "paper factories," using open databases and large language models to mass-produce low-quality, scientifically valueless papers.
"If left unchecked, this AI-driven method could be applied to all open databases, creating an unimaginable number of low-quality papers," warned Csaba Szabó, a pharmacologist at the University of Fribourg in Switzerland, who did not participate in this study. "This could open Pandora's box, with academic literature possibly being flooded by 'synthetic papers.'"
The research team filtered the U.S. National Health and Nutrition Examination Survey (NHANES) database, which is massive and covers health, diet, and lifestyle data of thousands of people. Researchers focused on "repeated" studies, those that explore the same variable associations but use different years, genders, or age groups as samples.
Their search of the public medical database (PubMed) found that 411 "repeated" studies were published between January 2021 and July 2025. Most were simple cases involving two almost identical papers, but three cases involved six repeated papers, some of which were even published multiple times in the same year.
British biomedical researcher Matt Spick from the University of Surrey directly stated that such situations "shouldn't happen at all, and are unhelpful for scientific literature."
Spick and his colleagues suspect that some people may also be using AI to bypass the plagiarism detection mechanisms of journals. To verify this, the research team had AI models ChatGPT and Gemini rewrite three highly repetitive research papers they found, generating new manuscripts based on NHANES data.
The result was that just two hours of manual correction allowed these AI manuscripts to pass the plagiarism detection tools commonly used by journals. When researchers analyzed the AI manuscripts using plagiarism detection tools commonly used by most publishers, the AI manuscripts did not score high enough to be considered problematic by editors.
"We were shocked that it worked almost immediately," Spick noted. Although the AI-generated manuscripts indeed had some errors, their content was sufficient to mislead, making it more difficult to distinguish between scholars conducting genuine research based on public databases and those deliberately creating papers using AI.
Igor Rudan, a public health scholar at the University of Edinburgh and co-editor-in-chief of the Journal of Global Health, also believes that "this presents new challenges for editors and publishers." He said, "When we first tried large language models, we anticipated this would become a problem, and this study confirmed our concerns."
As early as July, Spick warned that the surge in low-quality "mass-produced" papers based on open datasets like NHANES might be driven by AI. This analysis found that the number of repeated studies increased sharply after the official release of ChatGPT in 2022.
Researchers from Stanford University and other institutions analyzed 1.12 million papers on the preprint platforms arXiv and bioRxiv and found that the proportion of computer science papers using AI large language models reached as high as 22% between 2020 and 2024.
This has forced some publishers to tighten their policies. The Swiss open-access academic publisher Frontiers and the Public Library of Science (PLOS) in the United States have both announced tighter editorial review rules for research based on open datasets.
Elena Vicario, Head of Research Integrity at Frontiers, admitted that AI-driven repeated research poses a serious and ongoing challenge for publishers.
The publisher has published 132 identified duplicate papers in the past four years, accounting for 32%. However, Vicario said these occurred before the implementation of new editorial rules. Since May this year, Frontiers has rejected 1,382 submissions based on NHANES.
The global renowned research publishing house Springer Nature has a higher rate of duplicate papers, reaching 37%. Its open-access academic journal "Scientific Reports" has published 51 such papers.
Richard White, Editor-in-Chief of "Scientific Reports," responded, "We place great importance on the reliability of the research record. All identified papers will be investigated, and we will take necessary measures." He revealed that since the beginning of 2024, "Scientific Reports" has rejected more than 4,500 submissions based on NHANES.
White added that the journal's editorial team focuses on removing unethical and meaningless research while ensuring truly valuable results are published. "We are concerned about the misuse of these databases and have been taking action," he said.
This article is an exclusive article by Observer Net. Without permission, it cannot be reprinted.
Original: https://www.toutiao.com/article/7553590997066727978/
Statement: This article represents the views of the author. Please express your opinion by clicking on the [Up/Down] buttons below.