The disruption index is biased by citation inflation
Alexander M. Petersen, Felber Arroyave, Fabio Pammolli
TL;DR
The observed decline in disruption over time in citation networks may reflect citation inflation rather than a true loss of disruptive impact. The authors combine a deductive analysis of the disruption metric $CD_p$, empirical evaluation on the MAG dataset, and a computational Monte Carlo model incorporating citation-inflation and triadic-closure dynamics, plus an openly available synthetic network ensemble. They show that as reference-list length $r(t)$ grows and extraneous citations accumulate ($R_k$), the denominator of $CD_p$ inflates and drives $CD_p$ toward 0, a bias that persists even for $CD_p^{nok}$; turning off citation inflation or capping references can restore time-stationary behavior, and the $CD_5$ distribution aligns with an Extreme Value law. The work provides an openly available resource to test alternative disruption indices, discusses normalization strategies for time-invariant comparisons, and offers policy considerations such as limiting reference list lengths to temper citation inflation.
Abstract
A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to 'citation inflation', an unavoidable characteristic of real citation networks that manifests as a systematic time-dependent bias and renders cross-temporal analysis challenging. One driver of citation inflation is the ever-increasing lengths of reference lists over time, which in turn increases the density of links in citation networks, and causes the disruption index to converge to 0. A second driver is attributable to shifts in the construction of reference lists, which is increasingly impacted by self-citations that increase in the rate of triadic closure in citation networks, and thus confounds efforts to measure disruption, which is itself a measure of triadic closure. Combined, these two systematic shifts render the disruption index temporally biased, and unsuitable for cross-temporal analysis. The impact of this systematic bias further stymies efforts to correlate disruption to other measures that are also time-dependent, such as team size and citation counts. In order to demonstrate this fundamental measurement problem, we present three complementary lines of critique (deductive, empirical and computational modeling), and also make available an ensemble of synthetic citation networks that can be used to test alternative citation-based indices for systematic bias.
