Detection of metadata manipulations: Finding sneaked references in the scholarly literature
Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov, Jules di Scala, Dominika Tkaczyk, Kathryn Weber-Boer
TL;DR
The paper investigates sneaked references—metadata-only citations that are not present in the published reference list—and documents a substantial instance in IJISRT. It develops two automated detection approaches, M1 and M2, to contrast Crossref metadata against text-derived references from PDFs, with a baseline M0 for lower-bound estimation. Using a large-scale dataset of 47,170,721 documents and 2,782 Crossref records, it identifies 80,205 sneaked references, with some papers accumulating thousands of undue citations (e.g., 6,059) benefiting IJISRT. The work highlights metadata vulnerabilities in scholarly systems and suggests practical strategies for validation and scale-up to curb citation gaming.
Abstract
We report evidence of a new set of sneaked references discovered in the scientific literature. Sneaked references are references registered in the metadata of publications without being listed in reference section or in the full text of the actual publications where they ought to be found. We document here 80,205 references sneaked in metadata of the International Journal of Innovative Science and Research Technology (IJISRT). These sneaked references are registered with Crossref and all cite -- thus benefit -- this same journal. Using this dataset, we evaluate three different methods to automatically identify sneaked references. These methods compare reference lists registered with Crossref against the full text or the reference lists extracted from PDF files. In addition, we report attempts to scale the search for sneaked references to the scholarly literature.
