Table of Contents
Fetching ...

Incorrect Citation Association for Articles in Online-Only Springer Nature Journals

Tamás Kriváchy

TL;DR

The study identifies widespread distortions in citation metrics for online-only Springer Nature journals caused by a transition from page-based to article-number referencing and incomplete API metadata, with Article Number 1 disproportionately inflating its own citation counts. By analyzing Crossref, OpenCitations, Semantic Scholar, and journal sites, the authors demonstrate false attributions (I.1/I.2) and related miscitations (I.3) that have likely persisted since 2011, affecting millions of articles and authors. The work highlights substantial implications for literature search, impact metrics (e.g., SNIP, IF), and career evaluation, and calls for an urgent fix by Springer Nature along with coordinated updates by major data providers and a move toward unified citation metadata. The findings emphasize the need for transparent remediation, re-evaluation of affected metrics, and robust metadata standards to prevent recurrence in the scholarly ecosystem.

Abstract

We show that citation metrics of journal articles in many of the online-only Springer Nature journals and associated ones are distorted, going back to articles from 2001. We find that most likely due to an API response error, there are many incorrect references which typically lead to Article Number 1 of a given Volume. Among others, the issue affects journals such as Scientific Reports, Nature Communications, Communications journals, Cell Death & Disease, Light: Science & Applications, as well as many BMC, Discovery and npj journals. Beyond the negative effect of introducing incorrect reference information, this distorts the citation statistics of articles in these journals, with a few articles being massively over-cited compared to their peers, while many lose citations; e.g. both in Scientific Reports and in Nature Communications, 5 of the 10 top cited articles have article numbers of 1. We validate the distorted statistics by assessing data from multiple scientific literature databases: Crossref, OpenCitations, Semantic Scholar, and the journals' websites. The issue primarily arises from the inconsistent transition from page-based referencing of articles to article number-based referencing, as well as the improper handling of the change in the publisher's article metadata API. It seems that the most pressing problem has been present since approximately 2011, which we estimate affects the citation count of millions of authors.

Incorrect Citation Association for Articles in Online-Only Springer Nature Journals

TL;DR

The study identifies widespread distortions in citation metrics for online-only Springer Nature journals caused by a transition from page-based to article-number referencing and incomplete API metadata, with Article Number 1 disproportionately inflating its own citation counts. By analyzing Crossref, OpenCitations, Semantic Scholar, and journal sites, the authors demonstrate false attributions (I.1/I.2) and related miscitations (I.3) that have likely persisted since 2011, affecting millions of articles and authors. The work highlights substantial implications for literature search, impact metrics (e.g., SNIP, IF), and career evaluation, and calls for an urgent fix by Springer Nature along with coordinated updates by major data providers and a move toward unified citation metadata. The findings emphasize the need for transparent remediation, re-evaluation of affected metrics, and robust metadata standards to prevent recurrence in the scholarly ecosystem.

Abstract

We show that citation metrics of journal articles in many of the online-only Springer Nature journals and associated ones are distorted, going back to articles from 2001. We find that most likely due to an API response error, there are many incorrect references which typically lead to Article Number 1 of a given Volume. Among others, the issue affects journals such as Scientific Reports, Nature Communications, Communications journals, Cell Death & Disease, Light: Science & Applications, as well as many BMC, Discovery and npj journals. Beyond the negative effect of introducing incorrect reference information, this distorts the citation statistics of articles in these journals, with a few articles being massively over-cited compared to their peers, while many lose citations; e.g. both in Scientific Reports and in Nature Communications, 5 of the 10 top cited articles have article numbers of 1. We validate the distorted statistics by assessing data from multiple scientific literature databases: Crossref, OpenCitations, Semantic Scholar, and the journals' websites. The issue primarily arises from the inconsistent transition from page-based referencing of articles to article number-based referencing, as well as the improper handling of the change in the publisher's article metadata API. It seems that the most pressing problem has been present since approximately 2011, which we estimate affects the citation count of millions of authors.

Paper Structure

This paper contains 14 sections, 3 equations, 15 figures.

Figures (15)

  • Figure 1: Citation count histogram for Nature Communications Volume 16 (the volume of the current year, 2025) according to Crossref for articles that were published on the same day as Article Number 1 of this volume (data as of 2nd of October 2025). The citation count for Article Number 1 in Vol. 16 is depicted with the dashed red line.
  • Figure 2: Citation count histogram for a) Scientific Reports b) Nature Communications and c) BMC Public Health according to Crossref for articles published on or near the day of Article Number 1's publishing date for the years a) 2018 to 2025 b) 2019 to 2025 c) 2002 to 2025 (2022 excluded for Nature Communications, since Article 1 was a correction; citation count as of 2nd of October 2025). The citation counts for Article Number 1s for each year are depicted with dashed red lines. Citation counts are normalized among years such that the mean is 0 and standard deviation is 1, so they can be plotted together. For other citation count sources and unnormalized data plots, see App. \ref{['app:histograms']}.
  • Figure 3: Citation counts ranked for all published articles in a) Scientific Reports b) Nature Communications and c) BMC Public Health according to Crossref (data from approximately 2nd of November), with a log-log plot of the same data in the insets. Article 1 ranks are marked with dashed red lines. In both Scientific Reports and Nature Communications, 5 of the top 10 cited articles have article numbers of 1.
  • Figure 4: Evidence of temporal extent of issues. a) Already in 2020, the citation count histogram of Nature Communications Vol. 3 was distorted. Citation count taken from archived webpages of the journal's website using the Internet Archive's Wayback Machine. Citation count of Article 1 was taken from 2020 September 22, and of comparison articles the from the first available 2021 webpage archive. b) Normalized citation count histogram for BMC Public Health (data from Crossref) plotted in two parts: pre-2011 and post-2011, the year when the SpringerLink API was introduced. For the post-2011 years, the normalized citation count for Article 1s was only as low as in the pre-API era for years 2016, 2024 and 2025. Similar effect observed for other data sources (see App. \ref{['app:histograms']}).
  • Figure 5: Screenshot of the webpage of Article 1 from 2018 in Nature Communications (titled Structural absorption by barbule microstructures of super black bird of paradise feathers), archived on 16th of October 2022. None of the articles in the "This article is cited by" section actually cite the current article. This can immediately be suspected from the titles, but it can also be verified manually.
  • ...and 10 more figures