Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma; Zhitao Gao; Qi Chai; Wangchun Sun; Pinghui Wang; Hongbin Pei; Jing Tao; Lingyun Song; Jun Liu; Chen Zhang; Lizhen Cui

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui

TL;DR

An iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG), and employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths.

Abstract

Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: \textit{excessively long reasoning paths distracting from the answer generation}, and \textit{false-positive relations hindering the path refinement}. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7\% and 9.1\% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG. Code is available at \url{https://github.com/reml-group/DoG}.

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

TL;DR

Abstract

Paper Structure (25 sections, 6 figures, 3 tables)

This paper contains 25 sections, 6 figures, 3 tables.

Introduction
Related Work
Method
Task Formulation
Overview
Knowledge Graph Invoking
Relation Filtering
Answer Trying
Question Simplifying
Experiments
Dataset and Evaluation
Implementation Settings
Baselines
Reasoning on Knowledge Graphs
Main Result
...and 10 more sections

Figures (6)

Figure 1: Illustration of challenges and our solutions.
Figure 2: DoG framework. Given a question, our framework first enables LLMs to interact with knowledge graphs to retrieve the most relevant triple. Subsequently, it employs a subgraph-focusing mechanism, allowing LLMs to attempt answering at each reasoning step. If further reasoning is required, DoG leverages a multi-role LLM team to simplify the question from complex to easy based on the retrieved triples.
Figure 3: Impact of debate rounds for LLM reasoning on knowledge graphs. It is unnecessary to simplify the question for the 1-hop question within MetaQA.
Figure 4: Impacts of the number of exemplars on the performance of LLM reasoning. It is unnecessary to perform question simplifying for the 1-hop question within MetaQA. DoG does not utilize LLMs to generate answers for questions within MetaQA. Instead, it provides answers based on the last retrieved triple after iterative reasoning.
Figure 5: Analysis of 50 sampled failure cases per dataset. We visualize the proportion of factors contributing to errors. We do not perform manual inspection for the failure cases in CWQ and WebQ due to the lack of annotations, such as those for the ground-truth relations.
...and 1 more figures

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

TL;DR

Abstract

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)