Table of Contents
Fetching ...

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice

Alexander M. Petersen, Felber Arroyave, Fabio Pammolli

Abstract

Measuring the rate of innovation in academia and industry is fundamental to monitoring the efficiency and competitiveness of the knowledge economy. To this end, a disruption index (CD) was recently developed and applied to publication and patent citation networks (Wu et al., Nature 2019; Park et al., Nature 2023). Here we show that CD systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation -- one behavioral and the other structural. Whereas the behavioral explanation reflects shifts associated with techno-social factors (e.g. self-citation practices), the structural explanation follows from `citation inflation' (CI), an inextricable feature of real citation networks attributable to increasing reference list lengths, which causes CD to systematically decrease. We demonstrate this causal link by way of mathematical deduction, computational simulation, multi-variate regression, and quasi-experimental comparison of the disruptiveness of PNAS versus PNAS Plus articles, which differ only in their lengths. Accordingly, we analyze CD data available in the SciSciNet database and find that disruptiveness incrementally increased from 2005-2015, and that the negative relationship between disruption and team-size is remarkably small in overall magnitude effect size, and shifts from negative to positive for team size $\geq$ 8 coauthors.

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice

Abstract

Measuring the rate of innovation in academia and industry is fundamental to monitoring the efficiency and competitiveness of the knowledge economy. To this end, a disruption index (CD) was recently developed and applied to publication and patent citation networks (Wu et al., Nature 2019; Park et al., Nature 2023). Here we show that CD systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation -- one behavioral and the other structural. Whereas the behavioral explanation reflects shifts associated with techno-social factors (e.g. self-citation practices), the structural explanation follows from `citation inflation' (CI), an inextricable feature of real citation networks attributable to increasing reference list lengths, which causes CD to systematically decrease. We demonstrate this causal link by way of mathematical deduction, computational simulation, multi-variate regression, and quasi-experimental comparison of the disruptiveness of PNAS versus PNAS Plus articles, which differ only in their lengths. Accordingly, we analyze CD data available in the SciSciNet database and find that disruptiveness incrementally increased from 2005-2015, and that the negative relationship between disruption and team-size is remarkably small in overall magnitude effect size, and shifts from negative to positive for team size 8 coauthors.
Paper Structure (7 sections, 5 equations, 8 figures, 4 tables)

This paper contains 7 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Citation inflation is an inextricable feature of citation networks. The disruption index $CD_{p}$ is calculated according to three non-overlapping subsets of $\{c\}_{p} = \{c\}_{i} \cup \{c\}_{j} \cup \{c\}_{k}$, of sizes $N_{i}$, $N_{j}$ and $N_{k}$, respectively. (a,b) Schematic of the citation network sub-graph contributing to the calculation of the disruption index for two papers that differ only in the connectivity of the single reference contributing to $N_{k}$. Moreover, in order to convey the magnitude and impact of secular growth as it manifests on real citation networks, the subset $\{c\}_{k}$ for publication $p_{a}$ is characteristic of citation rates in the 1980s, whereas for $p_{b}$ it is characteristic of the 2000s. Consequently, $CD_{a} = 0.69$ and $CD_{b} = 0.45$, corresponding to a 35% decrease in $CD$ attributable to 20 years of increasing citation network density. (c) Schematic illustrating the inflation of the reference supply owing to the fact that the annual publication rate $n(t)$ (comprised of increasingly variable article lengths), along with the number of references per publication $r(t)$, have grown exponentially over time. Consequently, the observed densification is both within- and across-generation, such that older publications can receive more citations from present day research than from contemporaneous research due to secular growth. (d) Citation inflation even affects journal with relatively small change in $n(t)$, such as traditional print journals like Nature, which have witnessed 7-fold increases in reference list lengths over the last 60 years.
  • Figure 2: Computational simulation of growing citation networks: after 'turning off' CI, the systematic decline in $CD$ reverses.(a) The average $CD_{5}(t)$ calculated across 10 different computational realizations of (i) the standard CI model and (ii) the CI model with quenched reference list growth ($g_{r}=0$) for $t\geq 108$. (b) Average rate of extraneous citation, $R_{k}(t)$, showing that $CD_{5}(t)$ converges to 0 because the denominator of the disruption index in Eq. (\ref{['eqnCD']}) is unbounded as $r(t)$ grows.
  • Figure 3: Quasi-experimental test and validation of the CI hypothesis: counterfactual juxtaposition of research articles published in PNAS versus PNAS Plus.(a) Frequency distribution of the absolute disruption index, $\vert CD_{p,5}\vert$. (b) Frequency distribution of the number of references per paper, $r_{p}$. See Fig. \ref{['FigureS3.fig']} for comparison of the two subsamples across a wider range of characteristics. Dashed vertical bars indicate the subsample means. (c) For both subsamples, the decline in is fully attributable to the variation in $r_{p}$ such that the difference in average reference list lengths accounts for the entire, albeit small, difference in average $\vert CD_{p,5}\vert$.
  • Figure 4: Non-linear temporal and team-size trends in CD after controlling for CI. Marginal effects produced by multivariable regression that control for $r_{p}$ and $c_{p}$ (CI), increasing team sizes ($k_{p}$), and tendency for larger teams to produce longer papers with longer reference lists ($k_{p} \times t$). (a) Results indicate that disruptive science has incrementally increased since 2006 -- which is consistent with three independent re-analyses reported in bentley2023disruptionmacher2023illusiveholst2024dataset. The magnitude of the effect size ($0.06\sigma$) is relatively small. (b) In contrast to wu2019large, results indicate that large teams (incrementally) disrupt and small teams (incrementally) develop science. The magnitude of the effect size ($0.09\sigma$) is inconsequential in terms of team science policy guidance and team assembly strategy. Shown are factor variable point estimates with 95% confidence intervals; Gray error bars are not statistically deviant from the baseline level indicated by the horizontal dashed line ($p>0.05$). See Tables \ref{['AllFEYear.reg']} & \ref{['AllFEteamsize.reg']} for the full list of model parameter estimates.
  • Figure S1: Citation inflation affects journals of all sizes, even those with relatively small change in $n(t)$.(a) The number of research articles, $n_{t}$, published by Nature tabulated over 5-year intervals from 1960 to 2020. Counts are based upon Clarivate Analytics Web of Science Core Collection, using records classified as document type = "Article" and neglecting articles with $r_{p}=0$, which are likely misclassified editorial comments and the like. (b) The frequency distribution $P_{t} (r_{p})$ shows the distribution of the number of references per article, and indicates that the increasing trend in Figure 1(d) is not attributable to outliers, but rather a systematic shift towards larger $r_{p}$ values.
  • ...and 3 more figures