ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models
Jiani Guo, Zuchao Li, Jie Wu, Qianren Wang, Yun Li, Lefei Zhang, Hai Zhao, Yujiu Yang
TL;DR
This work tackles the challenge of reasoning over ultra-long contexts in large language models by introducing ToM, a tree-oriented MapReduce framework. ToM builds a DocTree by applying a Hierarchical Semantic Parser to document chunks and aggregating subtrees bottom-up, then performs recursive MapReduce reasoning across the hierarchy to generate and reconcile rationales from leaves to the root. The approach addresses limitations of RAG and divide-and-conquer by preserving cross-chunk relationships and enabling conflict-resolving aggregation across siblings and ancestors. Experimental results on 70B+-parameter LLMs show ToM achieving superior long-context reasoning performance over baselines, with notable gains on ultra-long QA and multi-choice tasks, validating the method’s effectiveness and scalability for complex long-document understanding. The work demonstrates that structured, hierarchical reasoning can significantly improve coherence and information integration in long-context scenarios, highlighting a practical pathway toward more robust long-context AI systems.
Abstract
Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .
