Table of Contents
Fetching ...

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Soyeong Jeong, Taehee Jung, Sung Ju Hwang, Joo-Kyung Kim, Dongyeop Kang

TL;DR

ToTAL addresses the challenge of knowledge-intensive, multi-hop reasoning in ultra-long contexts by introducing reusable thought templates that structure how evidence is integrated by long-context LLMs. The framework constructs initial templates from training data, composes multiple templates during inference, and iteratively refines them via textual gradient feedback without updating model weights. Across four benchmarks and multiple LLM families, ToTAL yields consistent improvements in both retrieval-free and retrieval-augmented settings and demonstrates transferability to open-source models, indicating broad applicability. The results show that structured, reusable reasoning patterns, coupled with a lightweight update mechanism, can significantly enhance factual grounding and multi-step inference in large-context scenarios, with implications for enterprise knowledge systems and transparent reasoning.

Abstract

Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

TL;DR

ToTAL addresses the challenge of knowledge-intensive, multi-hop reasoning in ultra-long contexts by introducing reusable thought templates that structure how evidence is integrated by long-context LLMs. The framework constructs initial templates from training data, composes multiple templates during inference, and iteratively refines them via textual gradient feedback without updating model weights. Across four benchmarks and multiple LLM families, ToTAL yields consistent improvements in both retrieval-free and retrieval-augmented settings and demonstrates transferability to open-source models, indicating broad applicability. The results show that structured, reusable reasoning patterns, coupled with a lightweight update mechanism, can significantly enhance factual grounding and multi-step inference in large-context scenarios, with implications for enterprise knowledge systems and transparent reasoning.

Abstract

Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).

Paper Structure

This paper contains 36 sections, 6 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: Thoughts and facts in LCLM, compared to transitional RAG and simple stuffing in LCLM.
  • Figure 2: Illustration of training and inference stages for template updates. Low-performing templates are identified via hit/miss statistics and refined with textual gradient feedback, enabling improved performance on new queries during inference.
  • Figure 3: RAG results on MuSiQue, showing retrieval recall at different $k$ values (left) and QA performance (F1) (right).
  • Figure 4: Iteration results of updates on CRAG and MuSiQue.
  • Figure 5: Generalization of templates to open-source models.
  • ...and 12 more figures