Table of Contents
Fetching ...

Structural Scaffolds for Citation Intent Classification in Scientific Publications

Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady

TL;DR

This work tackles citation intent classification by introducing Structural Scaffolds, a multitask neural framework that leverages the structural cues of scientific papers. The model jointly learns the main task of predicting citation intent with two auxiliary tasks: section title prediction and citation worthiness, enabling data-driven incorporation of document structure without manual feature engineering. Empirically, the approach achieves state-of-the-art results on the ACL-ARC dataset (notably 67.9% macro F1 with ELMo and both scaffolds) and demonstrates strong, scalable performance on the new SciCite dataset across multiple scientific domains. The contributions include the SciCite dataset, robust improvements over prior methods, and a demonstration of how auxiliary structural signals can enhance machine reading of scientific literature.

Abstract

Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. Our model achieves a new state-of-the-art on an existing ACL anthology dataset (ACL-ARC) with a 13.3% absolute increase in F1 score, without relying on external linguistic resources or hand-engineered features as done in existing methods. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets. Our code and data are available at: https://github.com/allenai/scicite.

Structural Scaffolds for Citation Intent Classification in Scientific Publications

TL;DR

This work tackles citation intent classification by introducing Structural Scaffolds, a multitask neural framework that leverages the structural cues of scientific papers. The model jointly learns the main task of predicting citation intent with two auxiliary tasks: section title prediction and citation worthiness, enabling data-driven incorporation of document structure without manual feature engineering. Empirically, the approach achieves state-of-the-art results on the ACL-ARC dataset (notably 67.9% macro F1 with ELMo and both scaffolds) and demonstrates strong, scalable performance on the new SciCite dataset across multiple scientific domains. The contributions include the SciCite dataset, robust improvements over prior methods, and a demonstration of how auxiliary structural signals can enhance machine reading of scientific literature.

Abstract

Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. Our model achieves a new state-of-the-art on an existing ACL anthology dataset (ACL-ARC) with a 13.3% absolute increase in F1 score, without relying on external linguistic resources or hand-engineered features as done in existing methods. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets. Our code and data are available at: https://github.com/allenai/scicite.

Paper Structure

This paper contains 20 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Example of citations with different intents (Background and Method).
  • Figure 2: Our proposed scaffold model for identifying citation intents. The main task is predicting the citation intent (top left) and two scaffolds are predicting the section title and predicting if a sentence needs a citation (citation worthiness).
  • Figure 3: Visualization of attention weights corresponding to our best scaffold model compared with the best baseline neural baseline model without scaffolds.
  • Figure 4: Confusion matrix showing classification errors of our best model on two datasets. The diagonal is masked to bring focus only on errors.