Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

Parag Jain; Mirella Lapata

Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

Parag Jain, Mirella Lapata

TL;DR

This work tackles conversational question answering over heterogeneous sources, requiring context tracking and robust reasoning across text, infoboxes, tables, and knowledge graphs. It introduces a dynamic graph representation of retrieved evidence and learns graph embeddings that are injected into a large language model, augmented by a memory module that stores past evidence to guide future reasoning. The model is trained end-to-end with cross-entropy, using a two-stage retrieval pipeline (Wikidata via CLOCQ and Wikipedia) to build the evidence graph, and a Graph Attention Network to reason over the graph before querying the LLM. On ConvMix, graph embeddings improve reasoning over multiple sources, and the memory module enhances robustness to noise and retrieval errors, with the best results achieved by Mistral-7B + Graph + Memory.

Abstract

We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes. Our method utilizes a graph structured representation to aggregate information about a question and its context (i.e., the conversation so far and evidence retrieved to find an answer), while also harnessing the reasoning and text generation capabilities of large language models (LLMs). Graph embeddings are directly injected into the LLM, bypassing the token embedding layers, and learned end-to-end by minimizing cross-entropy. Our model maintains a memory module to track and update past evidence, thus influencing the graph's structure, as the conversation evolves. Experimental results on the ConvMix benchmark(Christmann et al., 2022a) show that graph embeddings enhance the LLM's ability to reason, while the memory module provides robustness against noise and retrieval errors.

Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 6 figures, 4 tables)

This paper contains 26 sections, 8 equations, 6 figures, 4 tables.

Introduction
Related Work
Conversational Question Answering
LLMs with Graphs
Retrieval-augmented Generation
Overview
Model
Evidence Retrieval
Evidence Memory
Graph Construction
Graph Encoder
Integration with LLMs
Training
Experimental Setup
Dataset
...and 11 more sections

Figures (6)

Figure 1: Example interaction (left) from the ConvMix development set ConvMix and relevant evidence at query Q3 (right). Utterances Q1--Q3 explore the topic of album Kid A. Q4 transitions to the topic of Rolling Stone magazine. The evidence is retrieved from diverse sources highlighted in red. Wikipedia text and tables are prepended with their respective article title. Known entities are shown in blue. Underlined entities are identified through string matching.
Figure 2: Graph for retrieved evidence (subset) from Figure \ref{['fig:example_interaction_source']}. Tokens within each instance create local subgraphs in the form of a linear chain. Local subgraphs are connected through common entities (within <n> -- </n>) to build a global graph. Same color highlights connections between similar entities (some edges are omitted for clarity).
Figure 3: Sketch of proposed architecture. shows query Q3 from the interaction in Figure \ref{['fig:example_interaction_source']}. shows KG triples retrieved with CLOCQ and their entities (). Wikipedia articles for are parsed to extract sentences, infoboxes and tables. In , retrieved evidence is ranked based on the current query using BM25. creates an instruction prompt based on the input query (see Appendix \ref{['app:sec:prompts']} for the prompt template). In , a graph is constructed based on top ranked instances. depicts the learned graph neural network. Graph node embeddings are initialized using LLM token embeddings that are separate from the base model. shows the final embeddings which are passed to the LLM and are obtained by concatenating prompt (prefix, suffix) and graph embeddings (shown in different colors). is the LLM without the token embedding layer.
Figure 4: Analysis experiments for different model variants based on Mistral-7B prompted in a zero-shot setting, fine-tuned on ConMix without graph embeddings (+FT), with graph embeddings (+Graph), and with a memory module (+Graph +Memory). Performance degrades with numbers, tables, and later conversation turns.
Figure 5: Example prompt for models which do not employ graph embeddings. Only a few relevant pieces of evidence are shown, for the sake of brevity.
...and 1 more figures

Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

TL;DR

Abstract

Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (6)