Table of Contents
Fetching ...

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi

TL;DR

This work investigates verifiability in large language models by training them to quote verbatim from trusted pre training sources, notably Wikipedia. It introduces Quote-Tuning, which uses a fast quoting membership test to measure quotes via QUIP-Score, creates a synthetic quoting preference dataset, and applies Direct Preference Optimization to obtain a quoting biased model. Across long-form QA and open-ended text completion, Quote-Tuning yields substantial quoting gains with minimal or no loss in generation quality and even improves truthfulness on TruthfulQA, while generalizing to new domains and model families. The approach offers a scalable and annotation-free path to verifiable, trustworthy LLM outputs that complement retrieval based and citation grounded methods.

Abstract

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

TL;DR

This work investigates verifiability in large language models by training them to quote verbatim from trusted pre training sources, notably Wikipedia. It introduces Quote-Tuning, which uses a fast quoting membership test to measure quotes via QUIP-Score, creates a synthetic quoting preference dataset, and applies Direct Preference Optimization to obtain a quoting biased model. Across long-form QA and open-ended text completion, Quote-Tuning yields substantial quoting gains with minimal or no loss in generation quality and even improves truthfulness on TruthfulQA, while generalizing to new domains and model families. The approach offers a scalable and annotation-free path to verifiable, trustworthy LLM outputs that complement retrieval based and citation grounded methods.

Abstract

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.
Paper Structure (32 sections, 2 equations, 3 figures, 11 tables)

This paper contains 32 sections, 2 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Pipeline of Quote-Tuning. The algorithm works by (1) sampling multiple responses from a pre-trained LLM, (2) constructing preference data via rank-by-quoting, and (3) preference optimization to quote.
  • Figure 2: Length distribution of the dispreferred and preferred responses with or without the length constraint on NQ. Left: No length constraint. Right: added length constraint with $\delta_\text{length}=0.1$. Adding length constraints properly regulates length distribution of responses.
  • Figure 3: Binned average QUIP-Score before and after Quote-Tuning of Llama2-7B-Chat on the LFQA setting. Top: NQ; Bottom: ELI5. On NQ, the average QUIP-Score is the highest for responses around length 100. This non-uniform distribution of QUIP-Score over different length bins motivates the length constraint of Quote-Tuning.