Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval
Xiaojun Wu, Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Yuanliang Sun, Hui Xiong, Jia Li, Jian Guo
TL;DR
ToG-3 tackles the bottlenecks of static GraphRAG and LLM-dependent graph extraction by introducing MACER, a Multi-Agent Context Evolution and Retrieval loop, and a Chunk-Triplets-Community heterogeneous graph index. This dual-evolving mechanism adaptively refines both the query and the retrieved subgraph during reasoning, enabling precise evidence gathering even with lightweight LLMs. The approach yields state-of-the-art performance on deep multi-hop benchmarks and robust results on broad-domain tasks, with ablations confirming the critical role of evolving query and graph refinement. Practically, ToG-3 offers a scalable, deployable RAG solution that improves faithfulness and reasoning depth while reducing reliance on large pre-built knowledge graphs.
Abstract
Graph-based Retrieval-Augmented Generation (GraphRAG) has become the important paradigm for enhancing Large Language Models (LLMs) with external knowledge. However, existing approaches are constrained by their reliance on high-quality knowledge graphs: manually built ones are not scalable, while automatically extracted ones are limited by the performance of LLM extractors, especially when using smaller, local-deployed models. To address this, we introduce Think-on-Graph 3.0 (ToG-3), a novel framework featuring a Multi-Agent Context Evolution and Retrieval (MACER) mechanism. Its core contribution is the dynamic construction and iterative refinement of a Chunk-Triplets-Community heterogeneous graph index, powered by a Dual-Evolution process that adaptively evolves both the query and the retrieved sub-graph during reasoning. ToG-3 dynamically builds a targeted graph index tailored to the query, enabling precise evidence retrieval and reasoning even with lightweight LLMs. Extensive experiments demonstrate that ToG-3 outperforms compared baselines on both deep and broad reasoning benchmarks, and ablation studies confirm the efficacy of the components of MACER framework. The source code are available in https://github.com/DataArcTech/ToG-3.
