Cited Text Spans for Citation Text Generation

Xiangci Li; Yi-Hui Lee; Jessica Ouyang

Cited Text Spans for Citation Text Generation

Xiangci Li, Yi-Hui Lee, Jessica Ouyang

TL;DR

This work advances citation text generation by grounding outputs in the exact text spans (CTS) of cited papers rather than relying solely on abstracts, addressing the hallucination risk of abstractive methods. It demonstrates that distantly labeled CTS can scale to large datasets while maintaining fidelity to ground truth, and it introduces practical CTS retrieval (Context, Oracle, Keyword) and generation (RAG-FiD, LED) strategies evaluated on the CORWA dataset. The findings show CTS-based generation yields higher token overlap with target citations and improved faithfulness compared to abstract-only baselines, though fully automatic CTS retrieval remains challenging and benefits from a human-in-the-loop approach. The study highlights practical considerations for grounding, including dataset design, retrieval quality, and potential post-processing to mitigate plagiarism, offering a feasible path toward reliable, text-grounded citation generation in real-world use.

Abstract

An automatic citation generation system aims to concisely and accurately describe the relationship between two scientific articles. To do so, such a system must ground its outputs to the content of the cited paper to avoid non-factual hallucinations. Due to the length of scientific documents, existing abstractive approaches have conditioned only on cited paper abstracts. We demonstrate empirically that the abstract is not always the most appropriate input for citation generation and that models trained in this way learn to hallucinate. We propose to condition instead on the cited text span (CTS) as an alternative to the abstract. Because manual CTS annotation is extremely time- and labor-intensive, we experiment with distant labeling of candidate CTS sentences, achieving sufficiently strong performance to substitute for expensive human annotations in model training, and we propose a human-in-the-loop, keyword-based CTS retrieval approach that makes generating citation texts grounded in the full text of cited papers both promising and practical.

Cited Text Spans for Citation Text Generation

TL;DR

Abstract

Paper Structure (41 sections, 6 figures, 10 tables)

This paper contains 41 sections, 6 figures, 10 tables.

Introduction
Background and Related Work
A Large-Scale CTS Dataset: Distant vs. Human Labeling
Approach
Evaluation Metrics
Token overlap.
Faithfulness.
Findings
Distant labeling has high coverage of human annotations.
Distant labeling is more similar to the gold citation text.
Distant labeling performs better on the downstream generation task.
Citation Text Generation Using CTS
Problem Formulation
Data.
Approach
...and 26 more sections

Figures (6)

Figure 1: Overview of the proposed CTS-based citation generation approach. The Context, Oracle, and Keyword strategies are used to retrieve CTS from the cited paper (konstas-lapata-2012-unsupervised) and generate a citation text for the target paper (gong-etal-2019-enhanced). See Figure \ref{['fig:example']} for details of the example.
Figure 2: Performance of distantly-labeled CTS, measured by recall against human-labeled CTS. Solid lines ("Any"): each citation counts as one data point; a true positive is when at least one human-labeled sentence is distantly-labeled. Dotted lines ("Individual"): each human-labeled sentence is a separate data point. Green lines ("A"): AbuRa'ed. Yellow lines ("S"): CL-SciSumm.
Figure 3: Distribution of the lengths of the cited papers by the number of sentences in CL-SciSumm and AbuRa'ed.
Figure 4: Overlap of the top-$k$ distantly-labeled CTS (solid lines) & human-labeled CTS (dotted lines), measured by ROUGE-L recall against the gold citation text. Green lines ("A"): AbuRa'ed. Yellow lines ("S"): CL-SciSumm. The average length of human-labeled CTS is $|\overline{A}|$=16.8 and $|\overline{S}|$=2.5 sentences.
Figure 5: Extractiveness of the generated citation texts, measured by coverage and densitygrusky-etal-2018-newsroom against the generation input (abstract or CTS). The average coverage (C), density (D), and compression ratio (R) are shown for each generation setting. The "Gold" setting (\ref{['fig:led_target']}) is measured using Oracle CTS as the "input".
...and 1 more figures

Cited Text Spans for Citation Text Generation

TL;DR

Abstract

Cited Text Spans for Citation Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)