Table of Contents
Fetching ...

Scientific and technological knowledge grows linearly over time

Huquan Kang, Luoyi Fu, Russell J. Funk, Xinbing Wang, Jiaxin Ding, Shiyu Liang, Jianghao Wang, Lei Zhou, Chenghu Zhou

TL;DR

This work evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that include 213 million publications (1800-2020) and 7.6 million patents (1976-2020).

Abstract

The past few centuries have witnessed a dramatic growth in scientific and technological knowledge. However, the nature of that growth - whether exponential or otherwise - remains controversial, perhaps partly due to the lack of quantitative characterizations. We evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that include 213 million publications (1800-2020) and 7.6 million patents (1976-2020). We found that knowledge - which we conceptualize as the reduction of uncertainty in a knowledge network - grew linearly over time in naturally formed citation networks that themselves expanded exponentially. Moreover, our results revealed inflection points in the growth of knowledge that often corresponded to important developments within fields, such as major breakthroughs, new paradigms, or the emergence of entirely new areas of study. Around these inflection points, knowledge may grow rapidly or exponentially on a local scale, although the overall growth rate remains linear when viewed globally. Previous studies concluding an exponential growth of knowledge may have focused primarily on these local bursts of rapid growth around key developments, leading to the misconception of a global exponential trend. Our findings help to reconcile the discrepancy between the perceived exponential growth and the actual linear growth of knowledge by highlighting the distinction between local and global growth patterns. Overall, our findings reveal major science development trends for policymaking, showing that producing knowledge is far more challenging than producing papers.

Scientific and technological knowledge grows linearly over time

TL;DR

This work evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that include 213 million publications (1800-2020) and 7.6 million patents (1976-2020).

Abstract

The past few centuries have witnessed a dramatic growth in scientific and technological knowledge. However, the nature of that growth - whether exponential or otherwise - remains controversial, perhaps partly due to the lack of quantitative characterizations. We evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that include 213 million publications (1800-2020) and 7.6 million patents (1976-2020). We found that knowledge - which we conceptualize as the reduction of uncertainty in a knowledge network - grew linearly over time in naturally formed citation networks that themselves expanded exponentially. Moreover, our results revealed inflection points in the growth of knowledge that often corresponded to important developments within fields, such as major breakthroughs, new paradigms, or the emergence of entirely new areas of study. Around these inflection points, knowledge may grow rapidly or exponentially on a local scale, although the overall growth rate remains linear when viewed globally. Previous studies concluding an exponential growth of knowledge may have focused primarily on these local bursts of rapid growth around key developments, leading to the misconception of a global exponential trend. Our findings help to reconcile the discrepancy between the perceived exponential growth and the actual linear growth of knowledge by highlighting the distinction between local and global growth patterns. Overall, our findings reveal major science development trends for policymaking, showing that producing knowledge is far more challenging than producing papers.
Paper Structure (19 sections, 5 equations, 5 figures)

This paper contains 19 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of the measurement approach. (A), The relationship between knowledge, structure, and entropy. Humans discover unknown information to stretch the boundaries of known information, which is measured by Shannon entropy shannon1948a in information bits. By structuring known information, humans learn knowledge, which emerges through transforming isolated, disordered data into interconnected, ordered data. Structural entropy angsheng2016structural has been proven capable of quantifying information organized with a network structure, which is a hierarchical community structure. By quantifying information with and without structure, the knowledge represented by structure is quantified by the difference between the two. (B), The formula of Knowledge Quantification Index (KQI), structural entropy, and Shannon entropy. In the algorithm implementation of KQI, we use the equivalent form of the formula to calculate (see Methods). (C), The principle of the KQI applied to a hypothetical citation network. The citation network is first decomposed into multiple trees, and a virtual node acting as the source of all knowledge connects all the roots. The tree structure represents the knowledge formed in ideological inheritance. The KQI quantifies the potential inheritance structure of beliefs within the network. The mathematical expressions of KQI (green) are equal to the difference between Shannon entropy (blue) and structural entropy (brown).
  • Figure 2: Linear growth of knowledge. (A-D), Growth of KQI and the number of publications or patents in MAG and Patents View. The green lines and red curves provide regression fittings for linear and exponential models. The first 70% of the data is for regression fitting (solid line), and the last 30% is for forecasting (dashed line). The shaded bands represent the 95% confidence interval. Coefficient of determination and information criterion validate trends more suitable for a given data series. (E-F), Growth of KQI and number of publications in the disciplines of mathematics (green), psychology (orange), computer science (red), and biology (blue). Straight lines exhibit trends approaching linearity starting from certain years.
  • Figure 3: Duration of mathematical conjecture proving. The green scatter shows the duration (from the formulation to the proof completion, in years) of mathematical conjectures proved since 1960, with several notable examples highlighted. The solid black line is the least square linear regression, and the blue shaded band represents the 95% confidence interval. Spearman and Kendall rank correlation coefficients indicate a weak relationship between the duration of conjecture proving and the priority year of proof. The two-sided Cox-Stuart and Mann-Kendall hypothesis tests show that the duration of the mathematical conjectures' proofs has not changed significantly.
  • Figure 4: Inflection points in KQI evolution. (A), Inflection points in distinct disciplines. The green circles represent the KQIs computed for the entire network at different years. The blue circles represent the maximum disruption of papers published at each year. The green lines are the segmented linear regression results. The red lines denote the estimated inflection points, and the red shaded bands represent the standard deviations. The upper insets show the norm distribution of inflection points that we assumed. The right insets show the transformed distribution of inflection points with maximum disruption. The potentially relevant key events are also marked. (B), Distribution of inflection points with maximum disruption. The inflection density (see Methods) is plotted as a black line, while a blue-shaded band indicates the 95% confidence interval. The y-axis is scaled using symlog, producing a linear plot within the specified range of values near zero (<1). Two regions of interest in the density curve are highlighted using different colored shaded regions. During the evolution of the network, on average, one inflection point occurs as the maximum disruption increases from 0 to 0.69. There is a 95% probability of experiencing at least one inflection as the maximum disruption increases from 0 to 0.76. (C), The distribution of inflection points for 311 disciplines taken in b. Each line represents the evolving network of a discipline, and its extent on the x-axis corresponds to the range of maximum disruption for the evolving network.
  • Figure 5: Pareto principle and diminishing returns in KQI. (A), Cumulative KQI distribution after sorting publications by descending KQI order. The grey bar represents a histogram of the publications sorted in descending order according to KQI with KQI as its weight, expressing the contribution of the respective range of publications on KQI. The green and red lines are the cumulative curves of the KQI distribution; at their critical point, the ratio of publications in the vital few equals the ratio of trivial many to the total KQI contribution. The main plot displays statistics for publications in MAG, while the inset plot displays statistics for patents in Patents View. (B), Demo illustrating the law of diminishing returns. (C-E), Marginal KQI (see Methods) increment per publication (or patent) over time. The circle markers represent the average incremental KQI from each publication (or patent) in that year. The solid curves show the local regression, and the shaded bands indicate the 95% confidence interval.