Table of Contents
Fetching ...

AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering

Zheyuan Zhang, Kaiwen Shi, Zhengqing Yuan, Zehong Wang, Tianyi Ma, Keerthiram Murugesan, Vincent Galassi, Chuxu Zhang, Yanfang Ye

TL;DR

AgentRouter tackles the problem of selecting and coordinating among heterogeneous LLM-based agents for question answering by formulating it as a knowledge-graph-guided routing task. It constructs a knowledge graph that jointly encodes queries, contextual entities, and candidate agents, and trains a type-aware RouterGNN to produce a task-specific distribution over agents, using KL divergence to align routing with empirical performance $p^*(a|q)$. The final answer is obtained by a weighted fusion $\\hat{y}(q)=\phi(\{y_a(q),p_\theta(a|q,\mathcal{G})\})$, enabling principled collaboration that leverages complementary strengths. Extensive experiments across multi-hop and direct QA benchmarks show AgentRouter outperforms single agents and prior routing baselines, with robust generalization across backbones and tasks, and reveal the value of contextual graph signals and soft supervision for adaptive collaboration.

Abstract

Large language models (LLMs) and agent-based frameworks have advanced rapidly, enabling diverse applications. Yet, with the proliferation of models and agentic strategies, practitioners face substantial uncertainty in selecting the best configuration for a downstream task. Prior studies show that different agents and backbones exhibit complementary strengths, and that larger models are not always superior, underscoring the need for adaptive routing mechanisms. Existing approaches to agent routing, however, often emphasize cost efficiency while overlooking the fine-grained contextual and relational structure inherent in QA tasks. In this paper, we propose tAgentRouter, a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals. Specifically, we convert QA instance into a knowledge graph that jointly encodes queries, contextual entities, and agents, and then train a heterogeneous graph neural network (GNN) to propagate information across node types and produce task-aware routing distributions over agents. By leveraging soft supervision and weighted aggregation of agent outputs, AgentRouter learns principled collaboration schemes that capture the complementary strengths of diverse agents. Extensive experiments demonstrate that our framework consistently outperforms single-agent and ensemble baselines, while generalizing across benchmarks and LLM backbones. These results highlight the effectiveness and robustness of graph-supervised multi-agent routing for question answering.

AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering

TL;DR

AgentRouter tackles the problem of selecting and coordinating among heterogeneous LLM-based agents for question answering by formulating it as a knowledge-graph-guided routing task. It constructs a knowledge graph that jointly encodes queries, contextual entities, and candidate agents, and trains a type-aware RouterGNN to produce a task-specific distribution over agents, using KL divergence to align routing with empirical performance . The final answer is obtained by a weighted fusion , enabling principled collaboration that leverages complementary strengths. Extensive experiments across multi-hop and direct QA benchmarks show AgentRouter outperforms single agents and prior routing baselines, with robust generalization across backbones and tasks, and reveal the value of contextual graph signals and soft supervision for adaptive collaboration.

Abstract

Large language models (LLMs) and agent-based frameworks have advanced rapidly, enabling diverse applications. Yet, with the proliferation of models and agentic strategies, practitioners face substantial uncertainty in selecting the best configuration for a downstream task. Prior studies show that different agents and backbones exhibit complementary strengths, and that larger models are not always superior, underscoring the need for adaptive routing mechanisms. Existing approaches to agent routing, however, often emphasize cost efficiency while overlooking the fine-grained contextual and relational structure inherent in QA tasks. In this paper, we propose tAgentRouter, a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals. Specifically, we convert QA instance into a knowledge graph that jointly encodes queries, contextual entities, and agents, and then train a heterogeneous graph neural network (GNN) to propagate information across node types and produce task-aware routing distributions over agents. By leveraging soft supervision and weighted aggregation of agent outputs, AgentRouter learns principled collaboration schemes that capture the complementary strengths of diverse agents. Extensive experiments demonstrate that our framework consistently outperforms single-agent and ensemble baselines, while generalizing across benchmarks and LLM backbones. These results highlight the effectiveness and robustness of graph-supervised multi-agent routing for question answering.

Paper Structure

This paper contains 26 sections, 9 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Our Motivation. Performance variance across six classical agent designs (raw, CoT, self-consistency, MAD, ReAct-Reflect, and summary), each applied with the same prompt template. Results are shown for two LLM backbones (Llama-3-8B-Instruct, Mixtral-8$\times$7B-Instruct) on two benchmarks (HotpotQA, NewsQA). Each violin plot depicts the distribution of F1 scores across test instances, with the white dot indicating the median and the black bar the interquartile range. The plots highlight agents with the same backbone yield wide and non-overlapping distributions, and the relative ranking of agents varies substantially across tasks and backbones.
  • Figure 2: Overview of our proposed framework. (a) QA instances are converted into knowledge graphs with query, entity, and agent nodes, with edges defined to capture semantic relations. Query–agent edges are trainable to enable adaptive routing. (b) A type-aware heterogeneous RouterGNN propagates contextual and relational information across the graph. The router then predicts a task-dependent distribution over agents, trained via KL divergence against empirical agent performance. Final answers are obtained as the weight vector of the agents per query.
  • Figure 3: Percentage change ($\Delta$) in F1 (left) and EM (right) relative to $k=24$, used as the base (0%). Curves show how performance varies with the top $k$ agent clipped across datasets.
  • Figure 4: F1 performance on four QA benchmarks, varying Layers (top) and Hidden Dimensions (bottom). Error bars denote standard deviations.
  • Figure 5: F1 performance on HotpotQA and NewsQA under different temperature settings. Moderate values ($0.3$--$0.6$) yield the strongest results.
  • ...and 4 more figures