Table of Contents
Fetching ...

Trajectories of Change: Approaches for Tracking Knowledge Evolution

Raphael Schlattmann, Malte Vogl

TL;DR

This work tackles how knowledge evolves by tracking micro-level author trajectories against macro-field change within socio-epistemic networks (SEN). It combines two complementary approaches: a KL-divergence based analysis on unigram language models, $D(M_d \| M_q)$, and a density-based embedding trajectory method (EDE) using transformer-derived document embeddings, to capture semantic shifts and topic centrality over two-year intervals. Case studies of Silk and Treder within General Relativity and Gravitation illustrate how individuals can diverge from or align with global field trends, revealing that Silk tends toward future mainstream terminology while Treder gravitates toward foundational, past-oriented themes; importantly, the two methods yield convergent yet distinct insights, underscoring their complementarity. The framework demonstrates a scalable, data-driven means to quantify micro-macro knowledge evolution with potential extensions to full-text analysis and citation networks, enhancing interpretability for historians of science and scholars studying scientific development.

Abstract

We explore local vs. global evolution of knowledge systems through the framework of socio-epistemic networks (SEN), applying two complementary methods to a corpus of scientific texts. The framework comprises three interconnected layers-social, semiotic (material), and semantic-proposing a multilayered approach to understanding structural developments of knowledge. To analyse diachronic changes on the semantic layer, we first use information-theoretic measures based on relative entropy to detect semantic shifts, assess their significance, and identify key driving features. Second, variations in document embedding densities reveal changes in semantic neighbourhoods, tracking how concentration of similar documents increase, remain stable, or disperse. This enables us to trace document trajectories based on content (topics) or metadata (authorship, institution). Case studies of Joseph Silk and Hans-Jürgen Treder illustrate how individual scholar's work aligns with broader disciplinary shifts in general relativity and gravitation research, demonstrating the applications, limitations, and further potential of this approach.

Trajectories of Change: Approaches for Tracking Knowledge Evolution

TL;DR

This work tackles how knowledge evolves by tracking micro-level author trajectories against macro-field change within socio-epistemic networks (SEN). It combines two complementary approaches: a KL-divergence based analysis on unigram language models, , and a density-based embedding trajectory method (EDE) using transformer-derived document embeddings, to capture semantic shifts and topic centrality over two-year intervals. Case studies of Silk and Treder within General Relativity and Gravitation illustrate how individuals can diverge from or align with global field trends, revealing that Silk tends toward future mainstream terminology while Treder gravitates toward foundational, past-oriented themes; importantly, the two methods yield convergent yet distinct insights, underscoring their complementarity. The framework demonstrates a scalable, data-driven means to quantify micro-macro knowledge evolution with potential extensions to full-text analysis and citation networks, enhancing interpretability for historians of science and scholars studying scientific development.

Abstract

We explore local vs. global evolution of knowledge systems through the framework of socio-epistemic networks (SEN), applying two complementary methods to a corpus of scientific texts. The framework comprises three interconnected layers-social, semiotic (material), and semantic-proposing a multilayered approach to understanding structural developments of knowledge. To analyse diachronic changes on the semantic layer, we first use information-theoretic measures based on relative entropy to detect semantic shifts, assess their significance, and identify key driving features. Second, variations in document embedding densities reveal changes in semantic neighbourhoods, tracking how concentration of similar documents increase, remain stable, or disperse. This enables us to trace document trajectories based on content (topics) or metadata (authorship, institution). Case studies of Joseph Silk and Hans-Jürgen Treder illustrate how individual scholar's work aligns with broader disciplinary shifts in general relativity and gravitation research, demonstrating the applications, limitations, and further potential of this approach.
Paper Structure (20 sections, 1 equation, 5 figures, 2 tables)

This paper contains 20 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Relative Token Usage over Time for Silk (left) and Treder (right). The relative frequency of each token is calculated by dividing its occurrence by the total number of tokens within each two-year interval, focusing on the 20 most frequently used terms. To reduce the impact of years with insufficient data, any intervals with fewer than 50 total tokens were excluded. Terms were expanded, grouped by two-year bins, and scaled, with KDE applied to smooth frequencies for clearer trends.
  • Figure 2: Summed, synchronous Kullback-Leibler Divergence (KLD) over time for Silk (top) and Treder (bottom), showing the development of divergence in summed term usage relative to the full corpus. Using non-overlapping time slices, unigram models were generated for each slice, applying Jelinek-Mercer smoothing ($\lambda = 0.05$) to avoid zero probabilities, retaining only high significance terms, filtered via Welch's t-test ($\alpha = 0.05$).
  • Figure 3: Synchronous, pointwise Kullback-Leibler Divergence (KLD) over time for Silk (top) and Treder (bottom), showing the development of divergence in individual term usage relative to the full corpus. The blue and light blue baselines represent terms with one of the lowest cumulative KLD values across all slices ("gravitational" and "gravity"). The red lines highlight terms with one of the highest cumulative KLD values across all slices ("galaxy" for Silk and "mach" for Treder).
  • Figure 4: Asynchronous, summed Kullback-Leibler Divergence (KLD) for Silk (top) and Treder (bottom), showing divergence in term usage for each time slice relative to all other slices in the full corpus. On the x-axis is the time difference, showing comparisons of each time slice in the individual corpus to all others in the full corpus. The y-axis represents the time slices of the individual corpus. Years with the lowest divergence values are highlighted in red boxes. For instance, the y-axis slice on the bottom (1957-1958) compares from 0 forward up to +43 years (1999-2000), while the slice on the top (1999-2000) compares backward from 0 up to -43 years (1957-1958).
  • Figure 5: Embeddings Density Estimation (EDE) over time for publications by Silk (top) and Treder (bottom), showing shifts in density around their publications across time slices. The thick black line represents the median value of all publications, indicating the general trend.