Table of Contents
Fetching ...

SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation

Xiaqiang Tang, Yi Wang, Keyu Hu, Rui Xu, Chuang Li, Weigao Sun, Jian Li, Sihong Xie

TL;DR

SSFO tackles faithfulness hallucination in retrieval-augmented generation by a self-supervised alignment framework that builds preference data from the model’s own behavior when access to retrieved context is present versus absent. Using Direct Preference Optimization, SSFO nudges the model to prefer context-grounded outputs without external annotations, revealing a benign likelihood displacement mechanism that shifts probability mass toward context-consistent tokens. A variant, SSFO-λ, further amplifies this displacement to strengthen faithfulness, achieving state-of-the-art results across multiple faithfulness metrics and models, while preserving instruction-following capabilities and generalization to cross-lingual tasks. The method is data-efficient, requiring only hundreds of self-generated examples, and incurs negligible inference overhead, making it practical for broad deployment in RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) systems require Large Language Models (LLMs) to generate responses that are faithful to the retrieved context. However, faithfulness hallucination remains a critical challenge, as existing methods often require costly supervision and post-training or significant inference burdens. To overcome these limitations, we introduce Self-Supervised Faithfulness Optimization (SSFO), the first self-supervised alignment approach for enhancing RAG faithfulness. SSFO constructs preference data pairs by contrasting the model's outputs generated with and without the context. Leveraging Direct Preference Optimization (DPO), SSFO aligns model faithfulness without incurring labeling costs or additional inference burden. We theoretically and empirically demonstrate that SSFO leverages a benign form of \emph{likelihood displacement}, transferring probability mass from parametric-based tokens to context-aligned tokens. Based on this insight, we propose a modified DPO loss function to encourage likelihood displacement. Comprehensive evaluations show that SSFO significantly outperforms existing methods, achieving state-of-the-art faithfulness on multiple context-based question-answering datasets. Notably, SSFO exhibits strong generalization, improving cross-lingual faithfulness and preserving general instruction-following capabilities. We release our code and model at the anonymous link: https://github.com/chkwy/SSFO

SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation

TL;DR

SSFO tackles faithfulness hallucination in retrieval-augmented generation by a self-supervised alignment framework that builds preference data from the model’s own behavior when access to retrieved context is present versus absent. Using Direct Preference Optimization, SSFO nudges the model to prefer context-grounded outputs without external annotations, revealing a benign likelihood displacement mechanism that shifts probability mass toward context-consistent tokens. A variant, SSFO-λ, further amplifies this displacement to strengthen faithfulness, achieving state-of-the-art results across multiple faithfulness metrics and models, while preserving instruction-following capabilities and generalization to cross-lingual tasks. The method is data-efficient, requiring only hundreds of self-generated examples, and incurs negligible inference overhead, making it practical for broad deployment in RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) systems require Large Language Models (LLMs) to generate responses that are faithful to the retrieved context. However, faithfulness hallucination remains a critical challenge, as existing methods often require costly supervision and post-training or significant inference burdens. To overcome these limitations, we introduce Self-Supervised Faithfulness Optimization (SSFO), the first self-supervised alignment approach for enhancing RAG faithfulness. SSFO constructs preference data pairs by contrasting the model's outputs generated with and without the context. Leveraging Direct Preference Optimization (DPO), SSFO aligns model faithfulness without incurring labeling costs or additional inference burden. We theoretically and empirically demonstrate that SSFO leverages a benign form of \emph{likelihood displacement}, transferring probability mass from parametric-based tokens to context-aligned tokens. Based on this insight, we propose a modified DPO loss function to encourage likelihood displacement. Comprehensive evaluations show that SSFO significantly outperforms existing methods, achieving state-of-the-art faithfulness on multiple context-based question-answering datasets. Notably, SSFO exhibits strong generalization, improving cross-lingual faithfulness and preserving general instruction-following capabilities. We release our code and model at the anonymous link: https://github.com/chkwy/SSFO

Paper Structure

This paper contains 30 sections, 12 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: (a): Existing post-training methods rely on human annotators or superior LLM models to construct SFT or preference datasets, resulting in heavy labeling costs and lengthy post-training processes. (b) SSFO leverages the model itself to generate preference data: Given query $x$, it generates a context-grounded response $y_c'$ (with external knowledge) and a parametric-based response $y_p$ (query only). SSFO reduces faithfulness hallucination without external supervision and incurs negligible post‐training costs.
  • Figure 2: Left: Log-likelihood of preferred response $\pi_\theta(y_c'|x,c)$ versus dispreferred responses $\pi_\theta(y_p|x,c)$ over the course of SSFO optimization. Right: We compare the base instruct model and optimized model on MemoTrap memotrap dataset and show the mean change for context-based tokens $z_c$ and parametric-based tokens $z_p$, revealing that optimization increases $\Delta P(z_c)$ while decreasing $\Delta P(z_p)$, $r$ denotes the Pearson correlation coefficient.
  • Figure 3: Case study from the MemoTrap dataset illustrating benign likelihood displacement. The probability mass shifts from the parametric knowledge based token $z_p$ to the external knowledge based token $z_c$ after SSFO optimization.
  • Figure 4: Correlation plot illustrating Span Exact Match scores for NQ-Swap, NQ-Open, and MemoTrap (scaled to the left y-axis) and ROUGE-L F1 scores for ELI5 (scaled to the right y-axis). Grey lines depict the regression trends. $r$ denotes Pearson correlation.
  • Figure 5: Data efficiency study: SSFO requires about 60% of data (400–500 examples) to achieve 85% of the total performance gain over the instruct baseline.