Sneaked references: Cooked reference metadata inflate citation counts
Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov
TL;DR
This study reveals a metadata-level vulnerability in the DOI registration workflow where publishers can inject extra references into Crossref metadata that do not appear in the article text, inflating citation counts downstream. By collecting and comparing reference lists from Crossref, publisher websites, and Dimensions for a three-journal case study, the authors quantify sneaked references (~9.1%) and lost references (up to ~40% in Dimensions), demonstrating manipulations that bypass reader-visible content. They formulate a two-tier detection framework using δ^p_x = R^p_x − S^p to classify publications as OK, Sneaked, or Missing and compute lower bounds Δ for sneaked and missing references. The work highlights the vulnerability of the bibliometric ecosystem, discusses potential countermeasures (coherence checks, third-party audits, open APIs, and updated guidelines), and calls for broader verification to ensure accurate citation data and deter gaming of indicators.
Abstract
We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered 'lost' references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
