Improving Citation Text Generation: Overcoming Limitations in Length Control
Biswadip Mandal, Xiangci Li, Jessica Ouyang
TL;DR
The paper tackles the problem of length mismatch in citation text generation by studying the limits of automatic length prediction and evaluating heuristic length estimates. It introduces a joint framework that predicts a target length $ obreak\hat{len}$ and conditions generation via a length-aware decoder (LDPE) within a Longformer Encoder-Decoder, comparing multiple training regimes. Empirically, using the ground-truth length yields the largest gains, while predicted lengths remain noisy; heuristic author- and context-based length cues often offer the best practical performance. The work highlights a practical approach to improving citation quality and provides code for replication, underscoring the value of author-specific length cues over purely predicted lengths.
Abstract
A key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation. While prior works have investigated length-controlled generation, their effectiveness depends on knowing the appropriate generation length. In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of heuristic estimates of desired length.
