Table of Contents
Fetching ...

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs

Yerin Hwang, Yongil Kim, Yunah Jang, Jeesoo Bang, Hyunkyung Bae, Kyomin Jung

TL;DR

This work tackles data scarcity in topic-shift dialogues by introducing MP2D, a framework that automatically generates conversational QA data with natural topic transitions. MP2D leverages knowledge-graph paths to link related topics, retrieves multi-passage content, and uses a Passage-to-Dialogue process with an LLM to produce questions, creating dialogues that smoothly shift across topics. It introduces the TS-WikiDialog benchmark and demonstrates that LLM-based question generation excels in multi-passage scenarios, while finetuned models benefit from MP2D-generated data for topic segmentation and topic-shift detection. The approach enables scalable generation of topic-shift data, improves downstream ConvQA tasks, and provides a practical benchmark for evaluating LLMs on topic-shift robustness, albeit with considerations around cost and entity disambiguation.

Abstract

Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks.

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs

TL;DR

This work tackles data scarcity in topic-shift dialogues by introducing MP2D, a framework that automatically generates conversational QA data with natural topic transitions. MP2D leverages knowledge-graph paths to link related topics, retrieves multi-passage content, and uses a Passage-to-Dialogue process with an LLM to produce questions, creating dialogues that smoothly shift across topics. It introduces the TS-WikiDialog benchmark and demonstrates that LLM-based question generation excels in multi-passage scenarios, while finetuned models benefit from MP2D-generated data for topic segmentation and topic-shift detection. The approach enables scalable generation of topic-shift data, improves downstream ConvQA tasks, and provides a practical benchmark for evaluating LLMs on topic-shift robustness, albeit with considerations around cost and entity disambiguation.

Abstract

Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks.
Paper Structure (34 sections, 5 figures, 12 tables)

This paper contains 34 sections, 5 figures, 12 tables.

Figures (5)

  • Figure 1: An example of a topic shift dialogue. The MP2D framework utilizes paths in a Knowledge Graph (KG) to extract entities and facilitates natural topic transitions based on the relations between these entities.
  • Figure 2: An overview of the MP2D framework. In the knowledge graph, paths are identified and passages are retrieved for entities within those paths. Then, the retrieved passages and their relations become the "answers", and a LLM generates "questions" corresponding to each answer to create dialogues.
  • Figure 3: Results of the ConvQA response generation performance of GPT-3.5. Each score represents the BLEU-4 score, where tTS denotes a topic shift turn.
  • Figure 4: Interface of human evaluation. (1/2)
  • Figure 5: Interface of human evaluation. (2/2)