Structural Scaffolds for Citation Intent Classification in Scientific Publications
Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady
TL;DR
This work tackles citation intent classification by introducing Structural Scaffolds, a multitask neural framework that leverages the structural cues of scientific papers. The model jointly learns the main task of predicting citation intent with two auxiliary tasks: section title prediction and citation worthiness, enabling data-driven incorporation of document structure without manual feature engineering. Empirically, the approach achieves state-of-the-art results on the ACL-ARC dataset (notably 67.9% macro F1 with ELMo and both scaffolds) and demonstrates strong, scalable performance on the new SciCite dataset across multiple scientific domains. The contributions include the SciCite dataset, robust improvements over prior methods, and a demonstration of how auxiliary structural signals can enhance machine reading of scientific literature.
Abstract
Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. Our model achieves a new state-of-the-art on an existing ACL anthology dataset (ACL-ARC) with a 13.3% absolute increase in F1 score, without relying on external linguistic resources or hand-engineered features as done in existing methods. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets. Our code and data are available at: https://github.com/allenai/scicite.
