Table of Contents
Fetching ...

Uncovering simultaneous breakthroughs with a robust measure of disruptiveness

Munjung Kim, Sadamori Kojaku, Yong-Yeol Ahn

TL;DR

This study addresses the discreteness and locality limitations of the disruption index by introducing the Embedding Disruption Measure (EDM), a continuous disruptiveness metric derived from a directional graph embedding that learns past and future context vectors for each paper. Disruptiveness is quantified as the cosine distance between these vectors, $\Delta_i$, enabling robust detection of both single and simultaneous breakthroughs across massive citation networks. Empirical results on Web of Science, APS, Nobel Prize, Milestone papers, and patents show that $\Delta$ more accurately flags disruptive works than $D$, reduces degeneracy, and reveals simultaneous disruptions that were previously hidden. The framework offers a scalable, principled lens for studying scientific progress and the drivers of breakthrough discoveries, with clear applicability to identifying landmark research and the dynamics of simultaneous innovations.

Abstract

Progress in science and technology is punctuated by disruptive innovation and breakthroughs. Researchers have characterized these disruptions to explore the factors that spark such innovations and to assess their long-term trends. However, although understanding disruptive breakthroughs and their drivers hinges upon accurately quantifying disruptiveness, the core metric used in previous studies -- the disruption index -- remains insufficiently understood and tested. Here, after demonstrating the critical shortcomings of the disruption index, including its conflicting evaluations for simultaneous discoveries, we propose a new, continuous measure of disruptiveness based on a neural embedding framework that addresses these limitations. Our measure not only better distinguishes disruptive works, such as Nobel Prize-winning papers, from others, but also reveals simultaneous disruptions by allowing us to identify the "twins" that have the most similar future context. By offering a more robust and precise lens for identifying disruptive innovations and simultaneous discoveries, our study provides a foundation for deepening insights into the mechanisms driving scientific breakthroughs while establishing a more equitable basis for evaluating transformative contributions.

Uncovering simultaneous breakthroughs with a robust measure of disruptiveness

TL;DR

This study addresses the discreteness and locality limitations of the disruption index by introducing the Embedding Disruption Measure (EDM), a continuous disruptiveness metric derived from a directional graph embedding that learns past and future context vectors for each paper. Disruptiveness is quantified as the cosine distance between these vectors, , enabling robust detection of both single and simultaneous breakthroughs across massive citation networks. Empirical results on Web of Science, APS, Nobel Prize, Milestone papers, and patents show that more accurately flags disruptive works than , reduces degeneracy, and reveals simultaneous disruptions that were previously hidden. The framework offers a scalable, principled lens for studying scientific progress and the drivers of breakthrough discoveries, with clear applicability to identifying landmark research and the dynamics of simultaneous innovations.

Abstract

Progress in science and technology is punctuated by disruptive innovation and breakthroughs. Researchers have characterized these disruptions to explore the factors that spark such innovations and to assess their long-term trends. However, although understanding disruptive breakthroughs and their drivers hinges upon accurately quantifying disruptiveness, the core metric used in previous studies -- the disruption index -- remains insufficiently understood and tested. Here, after demonstrating the critical shortcomings of the disruption index, including its conflicting evaluations for simultaneous discoveries, we propose a new, continuous measure of disruptiveness based on a neural embedding framework that addresses these limitations. Our measure not only better distinguishes disruptive works, such as Nobel Prize-winning papers, from others, but also reveals simultaneous disruptions by allowing us to identify the "twins" that have the most similar future context. By offering a more robust and precise lens for identifying disruptive innovations and simultaneous discoveries, our study provides a foundation for deepening insights into the mechanisms driving scientific breakthroughs while establishing a more equitable basis for evaluating transformative contributions.

Paper Structure

This paper contains 18 sections, 25 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: The disruption index has critical limitations due to its discreteness and locality.(a) The disruption index quantifies the degree to which the descendent works rely solely on a focal work and are free from antecedent works. The disruption index of a focal paper reaches its minimum value $D=-1$ when all future works citing the focal work also cite the prior works referenced by the focal work. On the other hand, the focal work is maximally disruptive ($D=1$) when all future works cite the focal work while not citing any past works referenced by the focal work. (b) The disruption index is extremely sensitive; even a single missing citation can cause it to shift dramatically from -1 to 1. (c)$D$ exhibits high degeneracy, having the same values for the different citation topology structures. (d) The index only captures the local structure formed by directly connected papers, neglecting any structure beyond the immediate vicinity. (e) In cases where two papers jointly or simultaneously create disruption and receive equal recognition from descendants, even a single citation can turn them minimally disruptive from maximally disruptive.
  • Figure 1: Disruption index $D$ and its variants have high degeneracy while our new disruption index $\Delta$ does not. Two variants of $D$ are explored: $D_{\text{no}k}$, where the dominant influence of the $n_k$ term is mitigated; and $D_{5}$, the disruption index considering only citations occurring 5 years after the publication of the focal paper. Both indices of APS papers ($n =327,021$) and WOS papers ($n=23,664,187$) revealed higher degeneracy than the original index $D$.
  • Figure 2: Directional graph embedding captures disruptiveness. Unlike the disruption index, our embedding approach leverages the entire network structure to estimate the disruptiveness of each paper. This approach separately represents the citing and cited features of papers (see Methods for a detailed explanation of the algorithm). (a) First, we generate random walks (blue arrow) on the citation network. (b) Our model aims to learn two vectors ("future" and "past") for each paper that can be used to accurately predict 'what comes before' (future vector) and 'what comes next' (past vector) in the random walk trajectories. (c) As a result, future vector $\mathbf{f}$ approaches descendent papers vectors while past vector $\mathbf{p}$ approaches the antecedent papers vectors. (d) For the developing paper, the distance between the vectors representing antecedent works and descendent works are close in the embedding space because of the large reliance of descendent works on antecedent works. This makes the distance between future vector $\mathbf{f}$ and past vector $\mathbf{p}$ becomes closer. (e) For the disruptive paper, on the other hand, the distance between future vector and past vector becomes greater, as the fewer connections between antecedent papers and descendent papers make their representation vector far away in the latent space.
  • Figure 2: The Embedding Disruptiveness Measure (EDM) addresses locality issues inherent in the Disruption index. The Spearman's rank correlation between $D$ and $\Delta$ increases ($n = 327,021$), when the second citation step is considered in the calculation of $D$. This highlights the capacity of $\Delta$ to encompass a more extensive spectrum of information, surpassing the constraints associated with relying solely on a single citation.
  • Figure 3: Embedding Disruptiveness Measure (EDM) better captures disruptive works as well as simultaneous disruptions, which are obscured by disruptive index.(a) The disruption index $D$ has higher degeneracy than an embedding disruptiveness measure $\Delta$. The disruption $D$ has high degeneracy in specific values such as 1, 0.5, and 0.25 (see Figure \ref{['fig:problems']}). (b) The $D$ index of 302 Nobel Prize-winning and 278 milestone papers shows a bimodal distribution, mainly attributed to the failure of $D$ to consider simultaneous disruptions. Remarkably, $\Delta$ successfully rank most of them highly disruptive, eliminating the bimodal distribution. Also, the distribution of two indices in the randomization of the citation network highlights the influence of citations and references on $D$ scores. (c) The change of the percentile of $\Delta$ at the individual paper level varies between the randomized citation network and the original citation network. In contrast, the percentile of $D$ either shifts drastically due to the sensitivity of the index or remains nearly unchanged. (d) Firth's logistic regression shows that $\Delta$ correlates more strongly with the likelihood of papers becoming milestones among 327,021 APS papers or Nobel Prize-winning papers among 23,664,187 WOS papers, with higher and more statistically significant odds ratios than $D$. Error bars represent the 95% confidence interval. (e) Examples of the papers involved in simultaneous discoveries that were overlooked by the disruption index $D$ but effectively captured by EDM with high $\Delta$ score. The $D$ scores for these papers were positioned around the bottom 1%, contrasting with a potential ranking higher than the top 5% if not for the impact of mutual citation links. The downward-pointing triangles on the arrows indicate the percentile of $\Delta$ and $D$ of simultaneous discovery papers.
  • ...and 5 more figures