Table of Contents
Fetching ...

ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links

Serwar Basch, Ilia Kuznetsov, Tom Hope, Iryna Gurevych

TL;DR

The paper presents a domain-agnostic framework to bootstrap sentence-level cross-document links using semi-synthetic data and a two-stage evaluation: automatic benchmarking of linking approaches and a large-scale human-in-the-loop annotation study. It demonstrates that combining a strong retriever (Dragon+) with LLM-based classification (R+LLM) yields substantial gains in both recall and precision across peer-review and news domains, outperforming retrieval alone. The approach enables scalable generation of linked datasets and practical annotation workflows, achieving high-quality links while reducing manual effort; the authors release code, data, and annotation protocols to support broader research. The results suggest strong potential for downstream tasks like media framing analysis and peer-review assessment, with careful attention to domain characteristics and prompt design. Overall, the framework provides a practical, generalizable path to study and operationalize cross-document understanding at scale.

Abstract

Understanding fine-grained links between documents is crucial for many applications, yet progress is limited by the lack of efficient methods for data curation. To address this limitation, we introduce a domain-agnostic framework for bootstrapping sentence-level cross-document links from scratch. Our approach (1) generates and validates semi-synthetic datasets of linked documents, (2) uses these datasets to benchmark and shortlist the best-performing linking approaches, and (3) applies the shortlisted methods in large-scale human-in-the-loop annotation of natural text pairs. We apply the framework in two distinct domains -- peer review and news -- and show that combining retrieval models with LLMs achieves a 73% human approval rate for suggested links, more than doubling the acceptance of strong retrievers alone. Our framework allows users to produce novel datasets that enable systematic study of cross-document understanding, supporting downstream tasks such as media framing analysis and peer review assessment. All code, data, and annotation protocols are released to facilitate future research.

ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links

TL;DR

The paper presents a domain-agnostic framework to bootstrap sentence-level cross-document links using semi-synthetic data and a two-stage evaluation: automatic benchmarking of linking approaches and a large-scale human-in-the-loop annotation study. It demonstrates that combining a strong retriever (Dragon+) with LLM-based classification (R+LLM) yields substantial gains in both recall and precision across peer-review and news domains, outperforming retrieval alone. The approach enables scalable generation of linked datasets and practical annotation workflows, achieving high-quality links while reducing manual effort; the authors release code, data, and annotation protocols to support broader research. The results suggest strong potential for downstream tasks like media framing analysis and peer-review assessment, with careful attention to domain characteristics and prompt design. Overall, the framework provides a practical, generalizable path to study and operationalize cross-document understanding at scale.

Abstract

Understanding fine-grained links between documents is crucial for many applications, yet progress is limited by the lack of efficient methods for data curation. To address this limitation, we introduce a domain-agnostic framework for bootstrapping sentence-level cross-document links from scratch. Our approach (1) generates and validates semi-synthetic datasets of linked documents, (2) uses these datasets to benchmark and shortlist the best-performing linking approaches, and (3) applies the shortlisted methods in large-scale human-in-the-loop annotation of natural text pairs. We apply the framework in two distinct domains -- peer review and news -- and show that combining retrieval models with LLMs achieves a 73% human approval rate for suggested links, more than doubling the acceptance of strong retrievers alone. Our framework allows users to produce novel datasets that enable systematic study of cross-document understanding, supporting downstream tasks such as media framing analysis and peer review assessment. All code, data, and annotation protocols are released to facilitate future research.

Paper Structure

This paper contains 45 sections, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Framework overview.
  • Figure 2: R+LLM Setup.
  • Figure 3: F1 scores across datasets using different prompt configurations and models (Phi-4, Qwen2.5, GPT-4o). Results show a consistent trend across domains with listwise prompting outperforming pairwise prompting. Furthermore, combination prompts (tested listwise only) achieve the highest F1 scores on synthetic datasets, while performance gaps narrow on the converted datasets. These results showcase the interaction between prompt design, model capabilities, and dataset complexity.
  • Figure 4: LLM-only ablation. Models were prompted with both full documents (source and target), the specific source sentence, a link description, and in-context examples. While this setup captures more context and task-specific information, it still underperforms compared to the combination of Retriever and LLM. This highlights the importance of retrieval for narrowing the candidate space and reducing distractors, especially in long documents.
  • Figure 5: Prompt template for LLM-based pairwise sentence classification.
  • ...and 10 more figures