Table of Contents
Fetching ...

Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Sangwon Yu, Ik-hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, Sungroh Yoon

TL;DR

The paper identifies misordered context as a key bottleneck in multi-hop reasoning for LLMs, showing that the order of supporting documents can drastically affect performance. It introduces CoRe, a context repetition augmentation, and proves theoretically that repeating the context can cover all possible orders, enabling the model to align its reasoning with an optimal contiguous chain. Practically, the authors bound repetition with a fixed hat{k} to control inference costs and demonstrate substantial improvements (up to 30 percentage points F1 on 2WikiMultihopQA and up to 70 percentage points accuracy on a synthetic task) across multiple benchmarks and models, including retrieve-and-reason scenarios. CoRe also reduces positional bias and is compatible with CoT, offering a scalable, order-aware enhancement for long-context reasoning in real-world retrieval-augmented QA tasks.

Abstract

Multi-hop reasoning, which requires multi-step reasoning based on the supporting documents within a given context, remains challenging for large language models (LLMs). LLMs often struggle to filter out irrelevant documents within the context, and their performance is sensitive to the absolute position of supporting documents within that context. In this paper, we identify an additional challenge: LLMs' performance is also sensitive to the order, relative position, in which the supporting documents are presented. We refer to this as the misordered context problem. To address this issue, based on the theoretical approach, we propose a simple yet effective method called context repetition (CoRe), which involves prompting the model by repeatedly presenting the context. This ensures that certain contiguous reasoning segments within supporting documents are presented in the optimal order, effectively guiding the model's reasoning in the appropriate direction. Applying CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task. Additionally, CoRe helps mitigate the well-known "lost-in-the-middle" problem in LLMs and can be effectively combined with retrieval-based approaches utilizing Chain-of-Thought (CoT) reasoning.

Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

TL;DR

The paper identifies misordered context as a key bottleneck in multi-hop reasoning for LLMs, showing that the order of supporting documents can drastically affect performance. It introduces CoRe, a context repetition augmentation, and proves theoretically that repeating the context can cover all possible orders, enabling the model to align its reasoning with an optimal contiguous chain. Practically, the authors bound repetition with a fixed hat{k} to control inference costs and demonstrate substantial improvements (up to 30 percentage points F1 on 2WikiMultihopQA and up to 70 percentage points accuracy on a synthetic task) across multiple benchmarks and models, including retrieve-and-reason scenarios. CoRe also reduces positional bias and is compatible with CoT, offering a scalable, order-aware enhancement for long-context reasoning in real-world retrieval-augmented QA tasks.

Abstract

Multi-hop reasoning, which requires multi-step reasoning based on the supporting documents within a given context, remains challenging for large language models (LLMs). LLMs often struggle to filter out irrelevant documents within the context, and their performance is sensitive to the absolute position of supporting documents within that context. In this paper, we identify an additional challenge: LLMs' performance is also sensitive to the order, relative position, in which the supporting documents are presented. We refer to this as the misordered context problem. To address this issue, based on the theoretical approach, we propose a simple yet effective method called context repetition (CoRe), which involves prompting the model by repeatedly presenting the context. This ensures that certain contiguous reasoning segments within supporting documents are presented in the optimal order, effectively guiding the model's reasoning in the appropriate direction. Applying CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task. Additionally, CoRe helps mitigate the well-known "lost-in-the-middle" problem in LLMs and can be effectively combined with retrieval-based approaches utilizing Chain-of-Thought (CoT) reasoning.

Paper Structure

This paper contains 28 sections, 2 theorems, 12 equations, 10 figures, 13 tables.

Key Result

Theorem 1

For a given context $C$ with its supporting documents ordered according to some permutation $\tau$, the augmented context $f_{\text{rep}}^{(k)}(C)$ belongs to the set $\mathcal{C}_\sigma$ for any permutation $\sigma$. Formally,

Figures (10)

  • Figure 1: An example of the misordered context problem in multi-hop reasoning of large language models, sampled from the MuSiQue dataset. Model performance is sensitive to the order of given documents.
  • Figure 2: Evaluation results of F1 score for each query type in MuSiQue with permuted clean contexts.
  • Figure 3: Illustration of CoRe where the model understands the context with the optimal order of documents [1] and [0]. Text with a white background is the prompt, and text with a yellow background is the model output.
  • Figure 4: Main results in the synthetic task. Repetition step denotes the number of additional repetitions ($\hat{k}-1$).
  • Figure 5: Performance of Llama-3.1-8B-Instruct with permuted contexts of MuSiQue during repetitions. The red line denotes the context in the worst order, and the purple line denotes the context in the best order.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Corollary 1.1