Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Jingyu Zhang; Marc Marone; Tianjian Li; Benjamin Van Durme; Daniel Khashabi

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi

TL;DR

This work investigates verifiability in large language models by training them to quote verbatim from trusted pre training sources, notably Wikipedia. It introduces Quote-Tuning, which uses a fast quoting membership test to measure quotes via QUIP-Score, creates a synthetic quoting preference dataset, and applies Direct Preference Optimization to obtain a quoting biased model. Across long-form QA and open-ended text completion, Quote-Tuning yields substantial quoting gains with minimal or no loss in generation quality and even improves truthfulness on TruthfulQA, while generalizing to new domains and model families. The approach offers a scalable and annotation-free path to verifiable, trustworthy LLM outputs that complement retrieval based and citation grounded methods.

Abstract

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

TL;DR

Abstract

Paper Structure (32 sections, 2 equations, 3 figures, 11 tables)

This paper contains 32 sections, 2 equations, 3 figures, 11 tables.

Introduction
Preliminaries
Quantifying Quoting
Preference Optimization
Aligning LLMs to Quote with Quote-Tuning
Constraint 1: quoting.
Constraint 2: length.
Desirability of Quoting
Experiments
Improving Quoting in Long-Form QA
Task Construction
Baselines
Metrics
Results
Improving Quoting in Open-Ended Text Completion
...and 17 more sections

Figures (3)

Figure 1: Pipeline of Quote-Tuning. The algorithm works by (1) sampling multiple responses from a pre-trained LLM, (2) constructing preference data via rank-by-quoting, and (3) preference optimization to quote.
Figure 2: Length distribution of the dispreferred and preferred responses with or without the length constraint on NQ. Left: No length constraint. Right: added length constraint with $\delta_\text{length}=0.1$. Adding length constraints properly regulates length distribution of responses.
Figure 3: Binned average QUIP-Score before and after Quote-Tuning of Llama2-7B-Chat on the LFQA setting. Top: NQ; Bottom: ELI5. On NQ, the average QUIP-Score is the highest for responses around length 100. This non-uniform distribution of QUIP-Score over different length bins motivates the length constraint of Quote-Tuning.

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

TL;DR

Abstract

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Authors

TL;DR

Abstract

Table of Contents

Figures (3)