Table of Contents
Fetching ...

Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

Zijun Liu, Zhennan Wan, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

TL;DR

Scaling external knowledge input beyond context windows remains a key bottleneck for knowledge-intensive tasks. ExtAgents introduces a two-role multi-agent framework that separates Seeking Agents from a central Reasoning Agent, with global synchronization and progressive knowledge-accumulating reasoning to integrate massive inputs without extending the model’s context window. The authors formalize the problem, build the ∞Bench+ benchmark to stress long-document reasoning, and demonstrate that ExtAgents consistently improves multi-hop QA and long-survey generation across benchmarks and LLM families, while maintaining high parallelism and reasonable latency. The work advances practical scalability of inference-time knowledge integration and outlines future directions in adaptive orchestration, cross-modal reasoning, and safety.

Abstract

With the rapid advancement of post-training techniques for reasoning and information seeking, large language models (LLMs) can incorporate a large quantity of retrieved knowledge to solve complex tasks. However, the limited context window of LLMs obstructs scaling the amount of external knowledge input, prohibiting further improvement, especially for tasks requiring significant amount of external knowledge. Existing context window extension methods inevitably cause information loss. LLM-based multi-agent methods emerge as a new paradigm to handle massive input in a distributional manner, where we identify two core bottlenecks in existing knowledge synchronization and reasoning processes. In this work, we develop a multi-agent framework, $\textbf{ExtAgents}$, to overcome the bottlenecks and enable better scalability in inference-time knowledge integration without longer-context training. Benchmarked with our enhanced multi-hop question answering test, $\textbf{$\boldsymbol{\infty}$Bench+}$, and other public test sets including long survey generation, ExtAgents significantly enhances the performance over existing non-training methods with the same amount of external knowledge input, regardless of whether it falls $\textit{within or exceeds the context window}$. Moreover, the method maintains high efficiency due to high parallelism. Further study in the coordination of LLM agents on increasing external knowledge input could benefit real-world applications.

Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

TL;DR

Scaling external knowledge input beyond context windows remains a key bottleneck for knowledge-intensive tasks. ExtAgents introduces a two-role multi-agent framework that separates Seeking Agents from a central Reasoning Agent, with global synchronization and progressive knowledge-accumulating reasoning to integrate massive inputs without extending the model’s context window. The authors formalize the problem, build the ∞Bench+ benchmark to stress long-document reasoning, and demonstrate that ExtAgents consistently improves multi-hop QA and long-survey generation across benchmarks and LLM families, while maintaining high parallelism and reasonable latency. The work advances practical scalability of inference-time knowledge integration and outlines future directions in adaptive orchestration, cross-modal reasoning, and safety.

Abstract

With the rapid advancement of post-training techniques for reasoning and information seeking, large language models (LLMs) can incorporate a large quantity of retrieved knowledge to solve complex tasks. However, the limited context window of LLMs obstructs scaling the amount of external knowledge input, prohibiting further improvement, especially for tasks requiring significant amount of external knowledge. Existing context window extension methods inevitably cause information loss. LLM-based multi-agent methods emerge as a new paradigm to handle massive input in a distributional manner, where we identify two core bottlenecks in existing knowledge synchronization and reasoning processes. In this work, we develop a multi-agent framework, , to overcome the bottlenecks and enable better scalability in inference-time knowledge integration without longer-context training. Benchmarked with our enhanced multi-hop question answering test, \boldsymbol{\infty}, and other public test sets including long survey generation, ExtAgents significantly enhances the performance over existing non-training methods with the same amount of external knowledge input, regardless of whether it falls . Moreover, the method maintains high efficiency due to high parallelism. Further study in the coordination of LLM agents on increasing external knowledge input could benefit real-world applications.

Paper Structure

This paper contains 38 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Performance of scaling external knowledge input with ExtAgents and LLM$\times$MapReduce zhou2024llmtimesmapreducesimplifiedlongsequenceprocessing on $\infty$Bench+.
  • Figure 2: The illustration of scaling external knowledge input for context window extension methods for LLMs. Ideally, knowledge-intensive tasks, including QA and long generation, should benefit from scaled input.
  • Figure 3: Overview of ExtAgents: Our framework consists of multiple agents with fixed context windows, that collaboratively process (a) scalable external knowledge inputs beyond the context limit. It features (b) global knowledge synchronization, and (c) knowledge-accumulate reasoning processes. Moreover, ExtAgents support (d) both multi-hop QA and long survey generation tasks.
  • Figure 4: Ablation studies on the global knowledge synchronization (GKS) and knowledge-accumulating reasoning (KAR) on Hotpot QA.
  • Figure 5: Experiment of scaling external knowledge input on multi-hop QA tasks with gpt-4o-mini. (a) The top row shows the performance of ExtAgents and retrieval methods on HotpotQA. (b) The middle and the bottom row show the performance of ExtAgents and LLM$\times$MapReduce on En.QA and Zh.QA, respectively. The rightmost subfigure of each part is the baseline result.
  • ...and 4 more figures