Adaptive Retrieval-Augmented Generation for Conversational Systems

Xi Wang; Procheta Sen; Ruizhe Li; Emine Yilmaz

Adaptive Retrieval-Augmented Generation for Conversational Systems

Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz

TL;DR

The paper tackles whether retrieval-augmented generation should be applied at every turn in conversational systems. It introduces RAGate, a gating mechanism that decides whether to augment with external knowledge based on conversation context, and presents three implementation variants (Prompt, PEFT with QLoRA, and MHA encoder). Through experiments on the KETOD TOD dataset, it shows that adaptive augmentation can achieve high-quality, faithful responses with maintained confidence, often reducing the number of augmented turns while preserving performance. This approach offers practical benefits in reducing hallucinations and retrieval costs, with implications for more efficient and reliable knowledge-augmented dialogue systems.

Abstract

Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.

Adaptive Retrieval-Augmented Generation for Conversational Systems

TL;DR

Abstract

Paper Structure (14 sections, 2 equations, 4 figures, 4 tables)

This paper contains 14 sections, 2 equations, 4 figures, 4 tables.

Introduction
Related Work
Methodology
Problem Formulation
RAGate Gate Mechanism
Model Training and Evaluation Setups
Results and Analysis
Augmentation Need Classification
Adaptive Augmentation Analysis
RAGate for Response Generation
Conclusions
Prompts for RAGate-Prompt
Impact of Retrieval Quality on Adaptive RAG
Additional experimental results about RAGate for Response Generation

Figures (4)

Figure 1: Example conversation when generating a response with or without a knowledge snippet using a language model (GPT-4 in this example).
Figure 2: RAGate variants for implementing the gating function. The three variants are the prediction with pre-trained language models after prompting (1), after parameter-efficient fine-tuning (2), and with a multi-head attention encoder (3).
Figure 3: Frequency analysis of adaptive augmentations about the position of a conversation.
Figure 4: Frequency analysis of adaptive augmentations about dialogue domains.

Adaptive Retrieval-Augmented Generation for Conversational Systems

TL;DR

Abstract

Adaptive Retrieval-Augmented Generation for Conversational Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)