Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context
Sangwon Yu, Ik-hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, Sungroh Yoon
TL;DR
The paper identifies misordered context as a key bottleneck in multi-hop reasoning for LLMs, showing that the order of supporting documents can drastically affect performance. It introduces CoRe, a context repetition augmentation, and proves theoretically that repeating the context can cover all possible orders, enabling the model to align its reasoning with an optimal contiguous chain. Practically, the authors bound repetition with a fixed hat{k} to control inference costs and demonstrate substantial improvements (up to 30 percentage points F1 on 2WikiMultihopQA and up to 70 percentage points accuracy on a synthetic task) across multiple benchmarks and models, including retrieve-and-reason scenarios. CoRe also reduces positional bias and is compatible with CoT, offering a scalable, order-aware enhancement for long-context reasoning in real-world retrieval-augmented QA tasks.
Abstract
Multi-hop reasoning, which requires multi-step reasoning based on the supporting documents within a given context, remains challenging for large language models (LLMs). LLMs often struggle to filter out irrelevant documents within the context, and their performance is sensitive to the absolute position of supporting documents within that context. In this paper, we identify an additional challenge: LLMs' performance is also sensitive to the order, relative position, in which the supporting documents are presented. We refer to this as the misordered context problem. To address this issue, based on the theoretical approach, we propose a simple yet effective method called context repetition (CoRe), which involves prompting the model by repeatedly presenting the context. This ensures that certain contiguous reasoning segments within supporting documents are presented in the optimal order, effectively guiding the model's reasoning in the appropriate direction. Applying CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task. Additionally, CoRe helps mitigate the well-known "lost-in-the-middle" problem in LLMs and can be effectively combined with retrieval-based approaches utilizing Chain-of-Thought (CoT) reasoning.
