Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

Ruotong Pan; Boxi Cao; Hongyu Lin; Xianpei Han; Jia Zheng; Sirui Wang; Xunliang Cai; Le Sun

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun

TL;DR

RAG systems are vulnerable to noisy, outdated, and incorrect retrieved content. The paper introduces Credibility-aware Generation (CAG), a universal framework that injects multi-granularity credibility signals via a data-transformation pipeline and instruction fine-tuning, and validates it with the CAGB benchmark across open-domain, time-sensitive, and misinformation-rich QA scenarios. CAG demonstrates improved accuracy and robustness, outperforming retrieval-augmented baselines and maintaining performance under increasing context noise. The work also shows that customizing credibility enables applications like personalized responses and conflict resolution between evidence sources, highlighting practical impact for reliable information access in real-world AI systems.

Abstract

The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and correctness of the generated outcomes. In this paper, we propose Credibility-aware Generation (CAG), a universally applicable framework designed to mitigate the impact of flawed information in RAG. At its core, CAG aims to equip models with the ability to discern and process information based on its credibility. To this end, we propose an innovative data transformation framework that generates data based on credibility, thereby effectively endowing models with the capability of CAG. Furthermore, to accurately evaluate the models' capabilities of CAG, we construct a comprehensive benchmark covering three critical real-world scenarios. Experimental results demonstrate that our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents, thereby maintaining robust performance. Moreover, our model supports customized credibility, offering a wide range of potential applications.

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

TL;DR

Abstract

Paper Structure (43 sections, 2 equations, 14 figures, 14 tables)

This paper contains 43 sections, 2 equations, 14 figures, 14 tables.

Introduction
Credibility-aware Generation
Definition
Teaching Model to Credibility-aware Generation
Multi-granularity Credibility Annotation
Credibility-guided Explanation Generation
Instruction Fine-tuning
Credibility-aware Generation Benchmark
Credibility Assessment
Open-domain QA
Time-sensitive QA
Misinformation Polluted QA
Experiments
Setup
Baselines
...and 28 more sections

Figures (14)

Figure 1: The comparison between Retrieval-Augmented Generation (RAG) and Credibility-aware Generation (CAG). Incorporating credibility into the model aids in mitigating errors caused by flawed information introduced from the retrieval process.
Figure 2: Overview of data transformation framework. The training data is constructed by assigning credibility to contexts via multi-granularity credibility annotation (§\ref{['annotation']}) and prompting LLM to produce credibility-guided explanations (§\ref{['explan']}). The processed data is used to instruction fine-tuning (§\ref{['training']}) to endow the model with the ability for Credibility-aware Generation.
Figure 3: The performance of LLMs under varying noise ratios, which denote the proportions of retrieved noise documents. As the noise ratio increases, the performance of other methods markedly declines; in contrast, our model maintains stable performance in high noise ratio, attributed to its enhanced ability to prioritize accurate information.
Figure 4: The comparison of performance of LLMs under discarding low credibility document setting and CAG-7B across six datasets.
Figure 5: CAG provides personalized responses. We can see that CAG combines with user preferences to utilize customized credibility, offering personalized responses.
...and 9 more figures

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

TL;DR

Abstract

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (14)