Table of Contents
Fetching ...

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

Abstract

Scientific papers do more than report results $-$ they advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce $\texttt{ClaimFlow}$, a claim-centric view of the NLP literature, built from $304$ ACL Anthology papers (1979$-$2025) that are manually annotated with $1{,}084$ claims and $832$ cross-paper claim relations, indicating whether a citing paper $\textit{supports}$, $\textit{extends}$, $\textit{qualifies}$, $\textit{refutes}$, or references a claim as $\textit{background}$. Using $\texttt{ClaimFlow}$, we define a new task $-$ $\textit{Claim Relation Classification}$ $-$ which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating strong neural models and large language models on this task, we report baseline performance of $0.78$ macro-F1, highlighting that claim-relation classification is feasible but challenging. We further apply our model to $\sim$$13k$ NLP papers to analyze how claims evolve across decades of NLP research. Our analysis reveals that $63.5$% claims are never reused; only $11.1$% are ever challenged; meanwhile, widely propagated claims are more often $\textit{reshaped}$ through qualification and extension than directly confirmed or refuted. Overall, $\texttt{ClaimFlow}$ offers a lens for examining how ideas shift and mature within NLP, and a foundation for assessing whether models can interpret scientific argumentation.

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Abstract

Scientific papers do more than report results they advance that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce , a claim-centric view of the NLP literature, built from ACL Anthology papers (19792025) that are manually annotated with claims and cross-paper claim relations, indicating whether a citing paper , , , , or references a claim as . Using , we define a new task which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating strong neural models and large language models on this task, we report baseline performance of macro-F1, highlighting that claim-relation classification is feasible but challenging. We further apply our model to NLP papers to analyze how claims evolve across decades of NLP research. Our analysis reveals that % claims are never reused; only % are ever challenged; meanwhile, widely propagated claims are more often through qualification and extension than directly confirmed or refuted. Overall, offers a lens for examining how ideas shift and mature within NLP, and a foundation for assessing whether models can interpret scientific argumentation.
Paper Structure (77 sections, 11 figures, 9 tables)

This paper contains 77 sections, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Claims from different papers form a directed claim-claim interaction graph (top). Each edge corresponds to a citing-cited claim pair (middle). Claim Relation Classification predicts their epistemic relation conditioned on the citation context (bottom).
  • Figure 2: Distribution of claim–claim relations across ClaimFlow-AutoGraph.
  • Figure 3: Distribution of post-challenge engagement for claims; flow widths indicate the number of claims.
  • Figure 4: Distribution of the number of papers reusing each claim.
  • Figure 5: Structural evolution of the claim–claim interaction graph in NLP research.
  • ...and 6 more figures