Table of Contents
Fetching ...

Enhancing Knowledge Retrieval with Topic Modeling for Knowledge-Grounded Dialogue

Nhat Tran, Diane Litman

TL;DR

The paper tackles the knowledge retrieval bottleneck in knowledge-grounded dialogue by introducing topic modeling on the knowledge base to create topic-guided clusters and per-cluster encoders, integrated into a DPR/DPR-topic retrieval within a RAG framework (RAG-topic) and its context-aware variant (RAG-context-topic). It shows that incorporating topic distributions into retrieval scores improves top-K recall across two datasets (MultiDoc2Dial and KILT-dialogue) and that selecting the number of topics via validation yields robust gains, with optimal values typically around 4–5. The study also probes ChatGPT as a generator, finding it performs best when supplied with relevant retrieved knowledge, achieving higher generation metrics than retrieval-only baselines. Overall, the work demonstrates that topic-informed retrieval can significantly boost both retrieval and generation in knowledge-grounded dialogue and highlights practical trade-offs and future directions for integrating such methods with large language models.

Abstract

Knowledge retrieval is one of the major challenges in building a knowledge-grounded dialogue system. A common method is to use a neural retriever with a distributed approximate nearest-neighbor database to quickly find the relevant knowledge sentences. In this work, we propose an approach that utilizes topic modeling on the knowledge base to further improve retrieval accuracy and as a result, improve response generation. Additionally, we experiment with a large language model, ChatGPT, to take advantage of the improved retrieval performance to further improve the generation results. Experimental results on two datasets show that our approach can increase retrieval and generation performance. The results also indicate that ChatGPT is a better response generator for knowledge-grounded dialogue when relevant knowledge is provided.

Enhancing Knowledge Retrieval with Topic Modeling for Knowledge-Grounded Dialogue

TL;DR

The paper tackles the knowledge retrieval bottleneck in knowledge-grounded dialogue by introducing topic modeling on the knowledge base to create topic-guided clusters and per-cluster encoders, integrated into a DPR/DPR-topic retrieval within a RAG framework (RAG-topic) and its context-aware variant (RAG-context-topic). It shows that incorporating topic distributions into retrieval scores improves top-K recall across two datasets (MultiDoc2Dial and KILT-dialogue) and that selecting the number of topics via validation yields robust gains, with optimal values typically around 4–5. The study also probes ChatGPT as a generator, finding it performs best when supplied with relevant retrieved knowledge, achieving higher generation metrics than retrieval-only baselines. Overall, the work demonstrates that topic-informed retrieval can significantly boost both retrieval and generation in knowledge-grounded dialogue and highlights practical trade-offs and future directions for integrating such methods with large language models.

Abstract

Knowledge retrieval is one of the major challenges in building a knowledge-grounded dialogue system. A common method is to use a neural retriever with a distributed approximate nearest-neighbor database to quickly find the relevant knowledge sentences. In this work, we propose an approach that utilizes topic modeling on the knowledge base to further improve retrieval accuracy and as a result, improve response generation. Additionally, we experiment with a large language model, ChatGPT, to take advantage of the improved retrieval performance to further improve the generation results. Experimental results on two datasets show that our approach can increase retrieval and generation performance. The results also indicate that ChatGPT is a better response generator for knowledge-grounded dialogue when relevant knowledge is provided.
Paper Structure (12 sections, 1 equation, 3 figures, 10 tables)

This paper contains 12 sections, 1 equation, 3 figures, 10 tables.

Figures (3)

  • Figure 1: The modified retrieve-then-generate framework (based on RAG) with our contribution highlighted. The two topic modeling modules are the same one trained on the knowledge base.
  • Figure 2: An example dialogue from MultiDoc2Dial borrowed from multidoc2dial. The conversation (on the left) is grounded in 3 documents Doc-1, Doc-2, and Doc-3. Each dialogue segment indicates that all turns within it are grounded in the same document (e.g., A3 to A7 in Seg-2 are all grounded in Doc-2). A dialogue turn and its corresponding relevant span in a document are connected by a blue dashed line. The red dotted lines with arrows show the dialogue flow shifts among the grounding documents through the conversation (e.g., Doc-1 $\rightarrow$ Doc-2 $\rightarrow$ Doc-1 $\rightarrow$ Doc-3).
  • Figure 3: An example dialogue from KILT-dialogue borrowed from kilt. Two speakers talk about a given topic (e.g., Star Trek) grounded in a Wikipedia page.