Table of Contents
Fetching ...

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

TL;DR

GraphReader introduces a graph-based agent that converts long texts into a relational graph of key elements and atomic facts, enabling a coarse-to-fine exploration within a small context window. Through a rational planning stage, initial node selection, and iterative reading of atomic facts, chunks, and neighbors, the agent accumulates evidence in a notebook and reasons to a final answer using a majority-vote, chain-of-thought approach. Empirical results on LV-Eval and other long-context benchmarks show GraphReader with a 4k context window matches or surpasses GPT-4-128k on 16k–256k context lengths and excels on challenging multi-hop tasks, with favorable efficiency and recall metrics. The work demonstrates a scalable, graph-guided, agent-based solution to long-context QA that mitigates the limitations of fixed-context LLMs and conventional retrieval pipelines, though it relies on a closed-source API and could benefit from open-source development and further optimization of planning capabilities.

Abstract

Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

TL;DR

GraphReader introduces a graph-based agent that converts long texts into a relational graph of key elements and atomic facts, enabling a coarse-to-fine exploration within a small context window. Through a rational planning stage, initial node selection, and iterative reading of atomic facts, chunks, and neighbors, the agent accumulates evidence in a notebook and reasons to a final answer using a majority-vote, chain-of-thought approach. Empirical results on LV-Eval and other long-context benchmarks show GraphReader with a 4k context window matches or surpasses GPT-4-128k on 16k–256k context lengths and excels on challenging multi-hop tasks, with favorable efficiency and recall metrics. The work demonstrates a scalable, graph-guided, agent-based solution to long-context QA that mitigates the limitations of fixed-context LLMs and conventional retrieval pipelines, though it relies on a closed-source API and could benefit from open-source development and further optimization of planning capabilities.

Abstract

Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
Paper Structure (53 sections, 24 figures, 9 tables)

This paper contains 53 sections, 24 figures, 9 tables.

Figures (24)

  • Figure 1: Performance on LV-Eval at 5 context length levels. GraphReader outperforms existing open-sourced and closed-source models while demonstrating a scalable performance in very long contexts. In contrast, other models exhibit a significant decrease in performance as context length increases.
  • Figure 2: The illustration of our GraphReader approach, consisting of graph construction, graph exploration, and answer reasoning.
  • Figure 3: Performance of GraphReader with different initial node numbers on 2WikiMultihopQA and NarrativeQA. Results show the robustness of GraphReader towards different initial node numbers.
  • Figure 4: The impact of chunk size $L$ of GraphReader on the 256k length level of HotpotWikiQA-mixup.
  • Figure 5: Recall of supporting facts by different methods on HotpotWikiQA-mixup.
  • ...and 19 more figures