Table of Contents
Fetching ...

Adaptive Retrieval-Augmented Generation for Conversational Systems

Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz

TL;DR

The paper tackles whether retrieval-augmented generation should be applied at every turn in conversational systems. It introduces RAGate, a gating mechanism that decides whether to augment with external knowledge based on conversation context, and presents three implementation variants (Prompt, PEFT with QLoRA, and MHA encoder). Through experiments on the KETOD TOD dataset, it shows that adaptive augmentation can achieve high-quality, faithful responses with maintained confidence, often reducing the number of augmented turns while preserving performance. This approach offers practical benefits in reducing hallucinations and retrieval costs, with implications for more efficient and reliable knowledge-augmented dialogue systems.

Abstract

Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.

Adaptive Retrieval-Augmented Generation for Conversational Systems

TL;DR

The paper tackles whether retrieval-augmented generation should be applied at every turn in conversational systems. It introduces RAGate, a gating mechanism that decides whether to augment with external knowledge based on conversation context, and presents three implementation variants (Prompt, PEFT with QLoRA, and MHA encoder). Through experiments on the KETOD TOD dataset, it shows that adaptive augmentation can achieve high-quality, faithful responses with maintained confidence, often reducing the number of augmented turns while preserving performance. This approach offers practical benefits in reducing hallucinations and retrieval costs, with implications for more efficient and reliable knowledge-augmented dialogue systems.

Abstract

Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.
Paper Structure (14 sections, 2 equations, 4 figures, 4 tables)

This paper contains 14 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Example conversation when generating a response with or without a knowledge snippet using a language model (GPT-4 in this example).
  • Figure 2: RAGate variants for implementing the gating function. The three variants are the prediction with pre-trained language models after prompting (1), after parameter-efficient fine-tuning (2), and with a multi-head attention encoder (3).
  • Figure 3: Frequency analysis of adaptive augmentations about the position of a conversation.
  • Figure 4: Frequency analysis of adaptive augmentations about dialogue domains.