Learning Fine-Grained Grounded Citations for Attributed Large Language Models

Lei Huang; Xiaocheng Feng; Weitao Ma; Yuxuan Gu; Weihong Zhong; Xiachong Feng; Weijiang Yu; Weihua Peng; Duyu Tang; Dandan Tu; Bing Qin

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

Lei Huang, Xiaocheng Feng, Weitao Ma, Yuxuan Gu, Weihong Zhong, Xiachong Feng, Weijiang Yu, Weihua Peng, Duyu Tang, Dandan Tu, Bing Qin

TL;DR

FRONT tackles LLM hallucinations by introducing fine-grained grounded citations anchored to extractive quotes. It combines an automatic data-generation pipeline with a two-stage training framework, $G^3$ for grounding and $CAA$ for alignment, to produce grounded answers with precise citations. Evaluations on the ALCE benchmark show substantial gains in citation quality, with FRONT outperforming baselines and even surpassing ChatGPT in some settings, while demonstrating good generalization and scalability. This approach enhances verifiability and user ability to perform fine-grained checks, advancing reliable information-seeking with attributed LLMs.

Abstract

Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text with in-line citations, have shown potential in mitigating hallucinations and improving verifiability. However, current approaches suffer from suboptimal citation quality due to their reliance on in-context learning. Furthermore, the practice of citing only coarse document identifiers makes it challenging for users to perform fine-grained verification. In this work, we introduce FRONT, a training framework designed to teach LLMs to generate Fine-Grained Grounded Citations. By grounding model outputs in fine-grained supporting quotes, these quotes guide the generation of grounded and consistent responses, not only improving citation quality but also facilitating fine-grained verification. Experiments on the ALCE benchmark demonstrate the efficacy of FRONT in generating superior grounded responses and highly supportive citations. With LLaMA-2-7B, the framework significantly outperforms all the baselines, achieving an average of 14.21% improvement in citation quality across all datasets, even surpassing ChatGPT.

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

TL;DR

FRONT tackles LLM hallucinations by introducing fine-grained grounded citations anchored to extractive quotes. It combines an automatic data-generation pipeline with a two-stage training framework,

for grounding and

for alignment, to produce grounded answers with precise citations. Evaluations on the ALCE benchmark show substantial gains in citation quality, with FRONT outperforming baselines and even surpassing ChatGPT in some settings, while demonstrating good generalization and scalability. This approach enhances verifiability and user ability to perform fine-grained checks, advancing reliable information-seeking with attributed LLMs.

Abstract

Paper Structure (58 sections, 7 equations, 9 figures, 12 tables)

This paper contains 58 sections, 7 equations, 9 figures, 12 tables.

Introduction
Related Work
Retrieval Augmented Generation.
Attributed Large Language Models.
Task Formulation and Methodology
Automatic Data Generation Pipeline
Data Collection.
Attributed Answer Generation.
Data Filtering.
Two-Stage Training Recipe
Grounding Guided Generation
Consistency-Aware Alignment
Experimental Settings
Datasets
ASQA
...and 43 more sections

Figures (9)

Figure 1: Compared with the current attributed systems, the core idea behind Front is to first select the supporting quotes from retrieved sources and then condition the generation process on them, ensuring grounded responses and accurate citations.
Figure 2: Overview of the data generation pipeline. The pipeline consists of three primary steps: data collection, answer generation, and data filtering. Firstly, given a user query, the data collection module retrieves the top 100 relevant documents and employs a reranking model to select the top 5 most pertinent documents. Subsequently, attributed responses are generated by distilling ChatGPT via in-context learning. Finally, all responses are filtered by the data filtering module to ensure informativeness and attributability.
Figure 3: Overview of Front: The training recipe consists of two stages: grounding-guided generation and consistency-aware alignment. It enables LLMs to first generate precise grounding and subsequently guide the generation of attributed answers, thereby enhancing fine-grained attribution capability.
Figure 4: Ablation Study on Data Filtering.
Figure 5: Ablation study of different grounding guidance strategies on the ELI5 dataset.
...and 4 more figures

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

TL;DR

Abstract

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)