Table of Contents
Fetching ...

The Retraction Epidemic in Science Across Publishers, Fields, and Countries

Sara Venturini, Alessandra Urbinati, Paola Gallo, Jessica T. Davis, Alessandro Vespignani

Abstract

Retractions serve as an indicator of failures in research integrity, yet most analyses focus on absolute counts rather than risk per paper. We use one of the largest open bibliographic databases to develop incidence metrics normalized by population: retractions per publication and per active author annually. Applying an epidemiological framework that models counts with exposure, we find evidence of exponential growth in retraction incidence, with approximately a 5-year doubling time at both the paper and author levels. These patterns vary significantly across fields, publishers, and countries. While scientific output is becoming more democratized globally, retractions are concentrated in fewer countries, creating a "concentration" paradox that calls for targeted monitoring. Despite exponential growth, the absolute incidence remains low (0.12% in 2021), allowing for corrective intervention. Incidence-based monitoring provides a framework for evaluating policies that safeguard research integrity at scale.

The Retraction Epidemic in Science Across Publishers, Fields, and Countries

Abstract

Retractions serve as an indicator of failures in research integrity, yet most analyses focus on absolute counts rather than risk per paper. We use one of the largest open bibliographic databases to develop incidence metrics normalized by population: retractions per publication and per active author annually. Applying an epidemiological framework that models counts with exposure, we find evidence of exponential growth in retraction incidence, with approximately a 5-year doubling time at both the paper and author levels. These patterns vary significantly across fields, publishers, and countries. While scientific output is becoming more democratized globally, retractions are concentrated in fewer countries, creating a "concentration" paradox that calls for targeted monitoring. Despite exponential growth, the absolute incidence remains low (0.12% in 2021), allowing for corrective intervention. Incidence-based monitoring provides a framework for evaluating policies that safeguard research integrity at scale.

Paper Structure

This paper contains 30 sections, 12 equations, 21 figures, 13 tables.

Figures (21)

  • Figure 1: Global and temporal patterns of scientific retractions.A. World map of raw retraction counts by country in 2021 (log color scale). Countries with zero retractions (or missing data) are shown in grey. Selected countries are annotated with rounded retraction counts. Inset: Pearson correlation between total publications and retractions by country in 2021, restricted to countries with at least one retraction ($R^2=0.60$, $P\leq 0.05$). B. Annual totals of published works ($N_t$, green squares) and retracted works ($R_t$, orange triangles), 1992--2021. Both series rise over time, illustrating why raw counts alone cannot diagnose changes in per-paper risk. C. Annual retraction incidence (blue circles; percent of works published in year $t$ that are eventually retracted), with an exposure-adjusted GLM fit assuming exponential growth and a negative binomial likelihood (blue line; growth rate significant at $P\leq 10^{-4}$). Shaded band shows the 95% confidence interval for the mean incidence. The legend reports the estimated doubling time $T_d$ with its 95% CI. Incidence is plotted on a logarithmic $y$-axis. The inset report the linear-scale plot with the 95% prediction interval.
  • Figure 2: Retraction incidence over time across domains and doubling time by field.A. Annual retraction incidence (blue points) by domain, fit with an exposure-adjusted GLM assuming exponential growth and a negative binomial likelihood (blue line). Shaded bands denote 95% confidence intervals on the mean values ($P$-values and statistical analysis are reported in the Appendix). The legend reports the estimated doubling time $T_d$ with its 95% confidence intervals. Incidence is shown on a logarithmic $y$-axis B. Estimated doubling times ($T_d$) for the incidence of retracted works by field (nested within domains) with strong statistical support for exponential growth. Points show $T_d$ estimates; horizontal bars are 95% confidence intervals. Marker color encodes the field’s retraction incidence percentage in $2021$ (log color scale; see color bar). See Table \ref{['tbl:abbreviations']} for field abbreviations.
  • Figure 3: Retraction incidence over time across countries and publishers.A. Country-level retraction incidence for cases with strong statistical support for exponential growth (blue triangles), with a negative binomial, exposure-adjusted exponential fit (blue line; growth-rate $P\leq 0.05$). Shaded band: 95% CI of the mean values. The legend reports the estimated doubling time $T_d$ (95% CI). Logarithmic $y$-axis. B. Country-level retraction incidence (blue circles) for cases without strong support for any candidate model, shown with a centered 5-year rolling average (blue line) and $\pm$1 SD band (shaded). C. Publisher-level retraction incidence for cases with strong statistical support for exponential growth (cyan triangles), with a negative binomial, exposure-adjusted exponential fit (cyan line; growth-rate $P\leq 0.05$). Shaded band: 95% CI of the mean values. The legend reports $T_d$ (95% CI). Logarithmic $y$-axis. D. Publisher-level retraction incidence (cyan circles) for cases without strong model support, shown with a centered 5-year rolling average (cyan line) and $\pm$1 SD band (shaded).
  • Figure 4: Relative retraction incidence and concentration paradox.A. Relative retraction incidence (see definiton in the text) across fields, countries, and publishers in 2021. Each point represents one entity; its position on the $x$-axis is the entity’s relative incidence (log scale). Point size and color intensity encode the total number of retracted works in 2021 (log scale). Entities are grouped by type with dashed horizontal separators; representative labels mark discussed cases. See Table \ref{['tbl:abbreviations']} for field and publisher abbreviations. Number of countries required to account for 50%, 80%, and 90% of global output at five-year intervals (1992--2021), illustrating the concentration paradox. B: total publications—increasing country counts indicate democratization. C: total retractions—decreasing country counts indicate concentration in fewer countries. This divergence is reinforced by Gini coefficient analysis (Appendix Fig. \ref{['fig:Gini_countries']}): Gini for publications declines from 0.92 to 0.88, while Gini for retractions increases from 0.55 to 0.90, with even more extreme publisher-level concentration (Gini 0.45 to 0.90, Appendix Fig. \ref{['fig:Gini_publishers']}).
  • Figure A1: Weighted number of published works per country. Four world maps showing the geographic distribution of weighted publication counts in 1992, 2000, 2010, and 2021. Color intensity represents the number of works on a logarithmic scale (in gray countries with no publications).
  • ...and 16 more figures