Table of Contents
Fetching ...

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

Nirmal Roy, Leonardo F. R. Ribeiro, Rexhina Blloshmi, Kevin Small

TL;DR

This work proposes a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context, and demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages and assessing the quality of generated responses.

Abstract

Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users' contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn QA as it is more challenging for systems to comprehend conversational context and manage retrieved passages over multiple turns. In this work, we propose a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context. When retrieval is deemed necessary, the LLM then rewrites the conversation for passage retrieval and judges the relevance of returned passages before response generation. Operationally, we build on the single-turn SELF-RAG framework (Asai et al., 2023) and propose SELF-multi-RAG for conversational settings. SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages (by using summarized conversational context) and assessing the quality of generated responses. Experiments on three conversational QA datasets validate the enhanced response generation capabilities of SELF-multi-RAG, with improvements of ~13% measured by human annotation.

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

TL;DR

This work proposes a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context, and demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages and assessing the quality of generated responses.

Abstract

Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users' contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn QA as it is more challenging for systems to comprehend conversational context and manage retrieved passages over multiple turns. In this work, we propose a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context. When retrieval is deemed necessary, the LLM then rewrites the conversation for passage retrieval and judges the relevance of returned passages before response generation. Operationally, we build on the single-turn SELF-RAG framework (Asai et al., 2023) and propose SELF-multi-RAG for conversational settings. SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages (by using summarized conversational context) and assessing the quality of generated responses. Experiments on three conversational QA datasets validate the enhanced response generation capabilities of SELF-multi-RAG, with improvements of ~13% measured by human annotation.
Paper Structure (33 sections, 1 equation, 5 figures, 17 tables)

This paper contains 33 sections, 1 equation, 5 figures, 17 tables.

Figures (5)

  • Figure 1: Understanding conversational context. In multi-turn conversations user questions often refer to responses in previous turns based on passages already retrieved, as shown in the example above. To answer the follow-up question, it is not necessary to retrieve new passages and the LLM should refer back to the previously retrieved passage, which contains the response.
  • Figure 2: Summarizing conversational context. While using the entire conversational context as query to a retrieval model might introduce noise, using traditional rewriting methods might miss on important aspects of the conversations. Conversation summaries provide an adaptable approach as a retrieval query.
  • Figure 3: SELF-multi-RAG framework. Components of the pipeline highlighted in yellow are specific to multi-turn conversations. The critic model is used to obtain the special reflection tokens that the generator model is trained to predict while generating response.
  • Figure 4: (a) Relation between retrieval calls and number of turns considered in QReCC and UltraChat. (b) Answer quality measured by BERTScore for different turn configurations.
  • Figure 5: Screenshot of the human annotation template for the response quality measurement.