Table of Contents
Fetching ...

Improving Citation Text Generation: Overcoming Limitations in Length Control

Biswadip Mandal, Xiangci Li, Jessica Ouyang

TL;DR

The paper tackles the problem of length mismatch in citation text generation by studying the limits of automatic length prediction and evaluating heuristic length estimates. It introduces a joint framework that predicts a target length $ obreak\hat{len}$ and conditions generation via a length-aware decoder (LDPE) within a Longformer Encoder-Decoder, comparing multiple training regimes. Empirically, using the ground-truth length yields the largest gains, while predicted lengths remain noisy; heuristic author- and context-based length cues often offer the best practical performance. The work highlights a practical approach to improving citation quality and provides code for replication, underscoring the value of author-specific length cues over purely predicted lengths.

Abstract

A key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation. While prior works have investigated length-controlled generation, their effectiveness depends on knowing the appropriate generation length. In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of heuristic estimates of desired length.

Improving Citation Text Generation: Overcoming Limitations in Length Control

TL;DR

The paper tackles the problem of length mismatch in citation text generation by studying the limits of automatic length prediction and evaluating heuristic length estimates. It introduces a joint framework that predicts a target length and conditions generation via a length-aware decoder (LDPE) within a Longformer Encoder-Decoder, comparing multiple training regimes. Empirically, using the ground-truth length yields the largest gains, while predicted lengths remain noisy; heuristic author- and context-based length cues often offer the best practical performance. The work highlights a practical approach to improving citation quality and provides code for replication, underscoring the value of author-specific length cues over purely predicted lengths.

Abstract

A key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation. While prior works have investigated length-controlled generation, their effectiveness depends on knowing the appropriate generation length. In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of heuristic estimates of desired length.
Paper Structure (14 sections, 5 equations, 7 figures, 2 tables)

This paper contains 14 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Examples of generated citations that are too long (top) or too short (bottom).
  • Figure 2: Length difference in tokens between ground truth citations and li-etal-2022-corwa's generated citations.
  • Figure 3: Architecture of our joint length prediction and controlled citation generation models.
  • Figure 4: Example of over-long citation resulting in a hallucinated criticism of the cited paper.
  • Figure 5: Example of an overly-short, generic citation.
  • ...and 2 more figures