Detection of the papermilling behavior
Igor Podlubny
TL;DR
The paper tackles the problem of papermilling by analyzing Web of Science data to identify distinctive publication-citation dynamics and introduces the $I$-index as a robustness check against mere production of papers. It provides a MATLAB tool to quantify correlations, delays, and growth patterns, and demonstrates the method through a case study on mathematics, plus cross-field examples. The findings highlight that strong publication-citation synchronization, surges in yearly outputs, and low $I$-index values are typical of papermilling, driven in part by incentive structures surrounding promotions and open access economics. The work advocates multi-criteria screening and editor vigilance to mitigate the erosion of scientific integrity across disciplines.
Abstract
Based on the analysis of the data obtainable from the Web of Science publication and citation database, typical signs of possible papermilling behavior are described, quantified, and illustrated by examples. A MATLAB function is provided for the analysis of the outputs from the Web of Science. A new quantitative indicator -- integrity index, or I-index -- is proposed for using it along with standard bibliographic and scientometric indicators. A case study is presented.
