Table of Contents
Fetching ...

A Statistical Framework for Detecting Emergent Narratives in Longitudinal Text Corpora

Cynthia Medeiros, John Quigley, Matthew Revie

TL;DR

A statistical framework for detecting narrative emergence in longitudinal text corpora using Latent Dirichlet Allocation (LDA) is proposed and indicates that model-based topic trajectories can reflect identifiable shifts in economic discourse and provide a statistically grounded basis for analysing thematic change in longitudinal textual data.

Abstract

Narratives about economic events and policies are widely recognised as influential drivers of economic and business behaviour. Yet the statistical identification of narrative emergence remains underdeveloped. Narratives evolve gradually, exhibit subtle shifts in content, and may exert influence disproportionate to their observable frequency, making it difficult to determine when observed changes reflect genuine structural shifts rather than routine variation in language use. We propose a statistical framework for detecting narrative emergence in longitudinal text corpora using Latent Dirichlet Allocation (LDA). We define emergence as a sustained increase in a topic's relative prominence over time and articulate a statistical framework for interpreting such trajectories, recognising that topic proportions are latent, model-estimated quantities. We illustrate the approach using a corpus of academic publications in economics spanning 1970-2018, where Nobel Prize-recognised contributions serve as externally observable signals of influential narratives. Topics associated with these contributions display sustained increases in estimated prevalence that coincide with periods of heightened citation activity and broader disciplinary recognition. These findings indicate that model-based topic trajectories can reflect identifiable shifts in economic discourse and provide a statistically grounded basis for analysing thematic change in longitudinal textual data.

A Statistical Framework for Detecting Emergent Narratives in Longitudinal Text Corpora

TL;DR

A statistical framework for detecting narrative emergence in longitudinal text corpora using Latent Dirichlet Allocation (LDA) is proposed and indicates that model-based topic trajectories can reflect identifiable shifts in economic discourse and provide a statistically grounded basis for analysing thematic change in longitudinal textual data.

Abstract

Narratives about economic events and policies are widely recognised as influential drivers of economic and business behaviour. Yet the statistical identification of narrative emergence remains underdeveloped. Narratives evolve gradually, exhibit subtle shifts in content, and may exert influence disproportionate to their observable frequency, making it difficult to determine when observed changes reflect genuine structural shifts rather than routine variation in language use. We propose a statistical framework for detecting narrative emergence in longitudinal text corpora using Latent Dirichlet Allocation (LDA). We define emergence as a sustained increase in a topic's relative prominence over time and articulate a statistical framework for interpreting such trajectories, recognising that topic proportions are latent, model-estimated quantities. We illustrate the approach using a corpus of academic publications in economics spanning 1970-2018, where Nobel Prize-recognised contributions serve as externally observable signals of influential narratives. Topics associated with these contributions display sustained increases in estimated prevalence that coincide with periods of heightened citation activity and broader disciplinary recognition. These findings indicate that model-based topic trajectories can reflect identifiable shifts in economic discourse and provide a statistically grounded basis for analysing thematic change in longitudinal textual data.
Paper Structure (8 sections, 3 equations, 4 figures, 2 tables)

This paper contains 8 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Flowchart illustrating the stages of a Narrative Shift in academic thought. In our case study, we are mainly concerned with using NLP to identify the changes occurred in Step 3.
  • Figure 2: Time series of $\bar{\theta}_{k,t}$ for selected Nobel-associated topics across subfields, 1970-2018. The series indicate sustained increases in Finance-Topic 7 and Macroeconomics-Topic 12, alongside more moderate upward movements in the other topics.
  • Figure 3: Topic prevalence and citation trajectories for selected Nobel-associated topics, 1970-2018. Upper panels show annual aggregated topic prevalence $\bar{\theta}_{k,t}$; lower panels show citation counts for the corresponding Nobel-recognised contributions. For jointly associated topic-contribution pairs, Bernanke citations are shown in grey and Diamond-Dybvig citations in the topic colour. The panels illustrate both contemporaneous alignment (e.g. Finance Topic 7) and apparent lead–lag patterns (e.g. Labour Topic 7), motivating the formal lag analysis.
  • Figure 4: Lagged correlations between annual topic prevalence $\bar{\theta}_{k,t}$ and citation counts for selected Nobel-associated topics. The horizontal axis denotes lag in years; positive values indicate that topic prevalence precedes citation growth. Finance–Topic 7 and Macroeconomics–Topic 12 exhibit high correlations near lag zero, while Growth–Topic 9 displays increasing correlation at positive lags, consistent with topic prevalence leading subsequent citation accumulation.