ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Aniket Pramanick; Yufang Hou; Saif M. Mohammad; Iryna Gurevych

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

Abstract

Scientific papers do more than report results $-$ they advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce $\texttt{ClaimFlow}$, a claim-centric view of the NLP literature, built from $304$ ACL Anthology papers (1979$-$2025) that are manually annotated with $1{,}084$ claims and $832$ cross-paper claim relations, indicating whether a citing paper $\textit{supports}$, $\textit{extends}$, $\textit{qualifies}$, $\textit{refutes}$, or references a claim as $\textit{background}$. Using $\texttt{ClaimFlow}$, we define a new task $-$ $\textit{Claim Relation Classification}$ $-$ which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating strong neural models and large language models on this task, we report baseline performance of $0.78$ macro-F1, highlighting that claim-relation classification is feasible but challenging. We further apply our model to $\sim$$13k$ NLP papers to analyze how claims evolve across decades of NLP research. Our analysis reveals that $63.5$% claims are never reused; only $11.1$% are ever challenged; meanwhile, widely propagated claims are more often $\textit{reshaped}$ through qualification and extension than directly confirmed or refuted. Overall, $\texttt{ClaimFlow}$ offers a lens for examining how ideas shift and mature within NLP, and a foundation for assessing whether models can interpret scientific argumentation.

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Abstract

Scientific papers do more than report results

they advance

that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce

, a claim-centric view of the NLP literature, built from

ACL Anthology papers (1979

2025) that are manually annotated with

claims and

cross-paper claim relations, indicating whether a citing paper

, or references a claim as

. Using

, we define a new task

which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating strong neural models and large language models on this task, we report baseline performance of

macro-F1, highlighting that claim-relation classification is feasible but challenging. We further apply our model to

NLP papers to analyze how claims evolve across decades of NLP research. Our analysis reveals that

% claims are never reused; only

% are ever challenged; meanwhile, widely propagated claims are more often

through qualification and extension than directly confirmed or refuted. Overall,

offers a lens for examining how ideas shift and mature within NLP, and a foundation for assessing whether models can interpret scientific argumentation.

Paper Structure (77 sections, 11 figures, 9 tables)

This paper contains 77 sections, 11 figures, 9 tables.

Introduction
Contributions.
Related Work
Longitudinal Analyses of NLP Research.
Citation Analysis.
Scientific Claim and Argument Mining.
ClaimFlow: A Claim-Centric Dataset for Scientific Progress in NLP
Conceptual Framework
What is a Scientific Claim?
Claim Relations
Definition and Scope.
Assumption and Claim Availability.
Components of Claim--Claim Interaction.
Textual Grounding.
Relation Taxonomy.
...and 62 more sections

Figures (11)

Figure 1: Claims from different papers form a directed claim-claim interaction graph (top). Each edge corresponds to a citing-cited claim pair (middle). Claim Relation Classification predicts their epistemic relation conditioned on the citation context (bottom).
Figure 2: Distribution of claim–claim relations across ClaimFlow-AutoGraph.
Figure 3: Distribution of post-challenge engagement for claims; flow widths indicate the number of claims.
Figure 4: Distribution of the number of papers reusing each claim.
Figure 5: Structural evolution of the claim–claim interaction graph in NLP research.
...and 6 more figures

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Abstract

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Authors

Abstract

Table of Contents

Figures (11)