Table of Contents
Fetching ...

Training Language Models to Generate Text with Citations via Fine-grained Rewards

Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang

TL;DR

This work tackles hallucination and credibility in language models by training for attributable text generation using fine-grained rewards that separately optimize information correctness and citation quality. It combines distillation from a strong proprietary model with rejection sampling and reinforcement learning to guide citation-rich outputs, and compares against holistic reward baselines. Experiments on ALCE datasets and EXPERTQA show that fine-grained rewards significantly boost performance, with RS and RS+RL often surpassing ChatGPT, especially for smaller models. The study also analyzes retrieval contribution, citation error modes, and generalization to out-of-domain data, highlighting practical implications for trustworthy, verifiable AI systems and outlining avenues for further improvement in retrieval and iterative learning.

Abstract

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.

Training Language Models to Generate Text with Citations via Fine-grained Rewards

TL;DR

This work tackles hallucination and credibility in language models by training for attributable text generation using fine-grained rewards that separately optimize information correctness and citation quality. It combines distillation from a strong proprietary model with rejection sampling and reinforcement learning to guide citation-rich outputs, and compares against holistic reward baselines. Experiments on ALCE datasets and EXPERTQA show that fine-grained rewards significantly boost performance, with RS and RS+RL often surpassing ChatGPT, especially for smaller models. The study also analyzes retrieval contribution, citation error modes, and generalization to out-of-domain data, highlighting practical implications for trustworthy, verifiable AI systems and outlining avenues for further improvement in retrieval and iterative learning.

Abstract

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.
Paper Structure (65 sections, 3 equations, 4 figures, 17 tables)

This paper contains 65 sections, 3 equations, 4 figures, 17 tables.

Figures (4)

  • Figure 1: An example of ChatGPT performing the task of attributable generation. The model takes a question, retrieved passages, and the task instruction (omitted due to space limit) as the input, and generates a response with in-text citations. The response has 3 sentences, 2 of which do not have supporting citations. The third one has an irrelevant citation [1]. Moreover, ChatGPT does not capture the correct answer (451,225) mentioned in passages [3] and [4].
  • Figure 2: Right: The assignment of our fine-grained rewards ($R_1$: Answer Correctness, $R_2$: Citation Recall, $R_3$: Citation Precision). These rewards are assigned to corresponding tokens in the response (citation, EOS Token, etc.; highlighted in yellow). Left: An overview of our framework. Top: Distillation from ChatGPT (§ \ref{['sec:distill']}); Middle: Rejection Sampling (§ \ref{['sec:rejection_sampling']}); Bottom: Reinforcement Learning (§ \ref{['sec:reinforcement_learning']}).
  • Figure 3: Left: Examples of how the Correctness metrics are computed for ASQA (EM Rec), QAMPARI (Rec.-5, Prec), and ELI5 (Claim Rec) respectively; Right: An example of how the Citation Recall and Citation Precision are computed.
  • Figure 4: Training curves of LLaMA-2-7B with f.g.RL in the $combined$ setting, measured on the development set across 3 independent runs. The shaded region indicates the standard error across these runs.