Table of Contents
Fetching ...

Context-DPO: Aligning Language Models for Context-Faithfulness

Baolong Bi, Shaohan Huang, Yiwei Wang, Tianchi Yang, Zihan Zhang, Haizhen Huang, Lingrui Mei, Junfeng Fang, Zehao Li, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Shenghua Liu

TL;DR

This work tackles the critical challenge of ensuring LLM outputs remain faithful to provided context in retrieval-augmented settings. It introduces ConFiQA, a benchmark that simulates knowledge-conflict scenarios via counterfactual contexts, and Context-DPO, a direct preference optimization method tailored to context-faithfulness alignment. Empirical results show Context-DPO substantially improves context-faithfulness across multiple open-source models (up to 280.1% in $P_c$) without harming generative capabilities, and interpretability analyses reveal how contextual tokens drive faithful outputs. The approach offers a principled, internal-model alignment solution for context-dependent tasks and sets the stage for applying context-faithfulness optimization to broader RAG applications.

Abstract

Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alignment remains underexplored. To address this, we propose $\textbf{Context-DPO}$, the first alignment method specifically designed to enhance LLMs' context-faithfulness. We introduce $\textbf{ConFiQA}$, a benchmark that simulates Retrieval-Augmented Generation (RAG) scenarios with knowledge conflicts to evaluate context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models. Further analysis demonstrates that Context-DPO preserves LLMs' generative capabilities while providing interpretable insights into context utilization. Our code and data are released at https://github.com/byronBBL/Context-DPO

Context-DPO: Aligning Language Models for Context-Faithfulness

TL;DR

This work tackles the critical challenge of ensuring LLM outputs remain faithful to provided context in retrieval-augmented settings. It introduces ConFiQA, a benchmark that simulates knowledge-conflict scenarios via counterfactual contexts, and Context-DPO, a direct preference optimization method tailored to context-faithfulness alignment. Empirical results show Context-DPO substantially improves context-faithfulness across multiple open-source models (up to 280.1% in ) without harming generative capabilities, and interpretability analyses reveal how contextual tokens drive faithful outputs. The approach offers a principled, internal-model alignment solution for context-dependent tasks and sets the stage for applying context-faithfulness optimization to broader RAG applications.

Abstract

Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alignment remains underexplored. To address this, we propose , the first alignment method specifically designed to enhance LLMs' context-faithfulness. We introduce , a benchmark that simulates Retrieval-Augmented Generation (RAG) scenarios with knowledge conflicts to evaluate context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models. Further analysis demonstrates that Context-DPO preserves LLMs' generative capabilities while providing interpretable insights into context utilization. Our code and data are released at https://github.com/byronBBL/Context-DPO

Paper Structure

This paper contains 38 sections, 1 equation, 7 figures, 11 tables.

Figures (7)

  • Figure 1: LLMs may generate unfaithful responses when model knowledge conflicts with context, as shown in our case where GPT-3.5 stubbornly answers Jack Dorsey, ignoring user instruction or retrieved passage.
  • Figure 2: An illustration of aligning LLMs for context-faithfulness using our Context-DPO framework, demonstrated with 2-hop data from ConFiQA’s MC task. The process consists of four steps: 1) construct counterfactuals, questions, and responses based on sampled facts; 2) generate factual context using descriptions of head entities from the original triples, then edit entity-related words to create counterfactual context; 3) build preference data comprising questions, concatenated contexts, and faithful and stubborn responses; 4) align LLMs’ faithfulness using DPO.
  • Figure 3: Visualization of LLMs' context-faithfulness across different tasks in the ConFiQA benchmark.
  • Figure 4: Average logits (%) of key tokens faithful to contextual knowledge, comparing base models and models aligned using our Context-DPO.
  • Figure 5: Kernel density estimation of the softmax probability distribution for context-faithful tokens.
  • ...and 2 more figures