Table of Contents
Fetching ...

Advancing Large Language Model Attribution through Self-Improving

Lei Huang, Xiaocheng Feng, Weitao Ma, Liang Zhao, Yuchun Fan, Weihong Zhong, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

TL;DR

START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs, iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation.

Abstract

Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.

Advancing Large Language Model Attribution through Self-Improving

TL;DR

START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs, iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation.

Abstract

Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.

Paper Structure

This paper contains 55 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The data synthesis pipeline consists of five steps: given a user query, the LLM first generates an informative response without citations in a closed-book setting. Subsequently, the LLM decomposes this response into atomic claims. These claims are then randomly grouped into specific sets, which serve as the basis for generating documents that cover all included claims. Finally, we trace back to the initial response to relabel the citations.
  • Figure 2: Overview of our self-improving framework, which consists of two stages. The model is first warmed up using synthetic data (§\ref{['sec:data_synthesize']}). This provides a good starting point to enable the model to generate high-quality samples in the subsequent iterative training. Next, the model is further trained via rejection sampling fine-tuning and fine-grained preference optimization iteratively (§\ref{['sec:self_improving']}). This iterative process bootstraps the model's attribution capability by fully utilizing the supervision signals from its sampled generations.
  • Figure 3: The impact of supervision signals from different stages (synthetic data v.s. self-improvement) on attribution performance across ASQA, ELI5, and StrategyQA. The blue line represents the model that undergoes only supervised fine-tuning use synthetic data at iteration 0. The red line represents the model that first trains for two epochs with synthetic data at iteration 0, followed by one iteration of self-improvement.
  • Figure 4: Ablation study on the effect of synthetic data size on attribution and correctness performance. We sample 1k, 3k, and 5k user queries for data synthesis.
  • Figure 5: Illustration of the prompting design for the data synthesis pipeline.
  • ...and 2 more figures