Table of Contents
Fetching ...

Sneaked references: Cooked reference metadata inflate citation counts

Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov

TL;DR

This study reveals a metadata-level vulnerability in the DOI registration workflow where publishers can inject extra references into Crossref metadata that do not appear in the article text, inflating citation counts downstream. By collecting and comparing reference lists from Crossref, publisher websites, and Dimensions for a three-journal case study, the authors quantify sneaked references (~9.1%) and lost references (up to ~40% in Dimensions), demonstrating manipulations that bypass reader-visible content. They formulate a two-tier detection framework using δ^p_x = R^p_x − S^p to classify publications as OK, Sneaked, or Missing and compute lower bounds Δ for sneaked and missing references. The work highlights the vulnerability of the bibliometric ecosystem, discusses potential countermeasures (coherence checks, third-party audits, open APIs, and updated guidelines), and calls for broader verification to ensure accurate citation data and deter gaming of indicators.

Abstract

We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered 'lost' references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.

Sneaked references: Cooked reference metadata inflate citation counts

TL;DR

This study reveals a metadata-level vulnerability in the DOI registration workflow where publishers can inject extra references into Crossref metadata that do not appear in the article text, inflating citation counts downstream. By collecting and comparing reference lists from Crossref, publisher websites, and Dimensions for a three-journal case study, the authors quantify sneaked references (~9.1%) and lost references (up to ~40% in Dimensions), demonstrating manipulations that bypass reader-visible content. They formulate a two-tier detection framework using δ^p_x = R^p_x − S^p to classify publications as OK, Sneaked, or Missing and compute lower bounds Δ for sneaked and missing references. The work highlights the vulnerability of the bibliometric ecosystem, discusses potential countermeasures (coherence checks, third-party audits, open APIs, and updated guidelines), and calls for broader verification to ensure accurate citation data and deter gaming of indicators.

Abstract

We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered 'lost' references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
Paper Structure (15 sections, 2 equations, 4 figures, 2 tables)

This paper contains 15 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: References' long path from authors to bibliometric dashboards: after Editorial and Peer-Review assessment, metadata are registered to a DOI provider (here Crossref). Metadata are then retrieved by bibliometric platforms (The Lens, SpringerLink, Dimensions) that provide various services, such as a search engine and bibliometric dashboards for institutions.
  • Figure 2: PubPeer post https://pubpeer.com/publications/A172115FC8D0A5F44B31A18B08BB26 reporting a Hindawi journal article with more citations than downloads. Most citations appear not to match any of the references in the allegedly citing publications. After careful examination, it appeared that these were sneaked references: existing in the metadata only and not in the PDFs of the allegedly 'citing' publications.
  • Figure 3: Reference list for publication https://doi.org/10.32628/IJSRST229212 as registered at Crossref (left: https://api.crossref.org/works/10.32628/IJSRST229212) and as retrieved from Dimensions (right: https://app.dimensions.ai/details/publication/pub.1146638907). Crossref provides the attribute reference-count (highlighted in blue) and a reference list of 47 references (numbers 0 to 9 shown). References 6 to 46 are sneaked references. Dimensions lists 13 references, none of them appear in the original paper (\ref{['fig:Journ']}).
  • Figure 4: Reference list in PDF (left) and in HTML (right) versions of https://doi.org/10.32628/IJSRST229212. In this case, the PDF and HTML versions match each other, which is expected.