Table of Contents
Fetching ...

Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs

Joon Park, Kyohei Atarashi, Koh Takeuchi, Hisashi Kashima

TL;DR

Long-context understanding in LLMs remains challenging when key facts are dispersed across very long texts. The authors propose a single-pass emulation of Retrieval Augmented Generation by prompting the model to tag relevant segments, generate localized summaries, and perform chain-of-thought reasoning before producing a final answer. Experiments on the BABILong benchmark (QA2, QA7, QA10) show the method can outperform naive baselines and standard RAG pipelines on multi-hop tasks, with results sensitive to prompt order and context length. This approach offers a practical, lightweight alternative to external retrievers, improving robustness of long-context comprehension without additional retrieval infrastructure.

Abstract

This paper addresses the challenge of comprehending very long contexts in Large Language Models (LLMs) by proposing a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought (CoT) reasoning. While recent LLMs support over 100,000 tokens in a single prompt, simply enlarging context windows has not guaranteed robust multi-hop reasoning when key details are scattered across massive input. Our approach treats the model as both the retriever and the reasoner: it first tags relevant segments within a long passage, then employs a stepwise CoT workflow to integrate these pieces of evidence. This single-pass method thereby reduces reliance on an external retriever, yet maintains focus on crucial segments. We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text. Compared to baseline (no retrieval) and naive RAG pipelines, our approach more accurately handles multi-fact questions such as object location tracking, counting, and indefinite knowledge. Furthermore, we analyze how prompt structure, including the order of question, relevant-text tags, and overall instructions, significantly affects performance. These findings underscore that optimized prompt engineering, combined with guided reasoning, can enhance LLMs' long-context comprehension and serve as a lightweight alternative to traditional retrieval pipelines.

Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs

TL;DR

Long-context understanding in LLMs remains challenging when key facts are dispersed across very long texts. The authors propose a single-pass emulation of Retrieval Augmented Generation by prompting the model to tag relevant segments, generate localized summaries, and perform chain-of-thought reasoning before producing a final answer. Experiments on the BABILong benchmark (QA2, QA7, QA10) show the method can outperform naive baselines and standard RAG pipelines on multi-hop tasks, with results sensitive to prompt order and context length. This approach offers a practical, lightweight alternative to external retrievers, improving robustness of long-context comprehension without additional retrieval infrastructure.

Abstract

This paper addresses the challenge of comprehending very long contexts in Large Language Models (LLMs) by proposing a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought (CoT) reasoning. While recent LLMs support over 100,000 tokens in a single prompt, simply enlarging context windows has not guaranteed robust multi-hop reasoning when key details are scattered across massive input. Our approach treats the model as both the retriever and the reasoner: it first tags relevant segments within a long passage, then employs a stepwise CoT workflow to integrate these pieces of evidence. This single-pass method thereby reduces reliance on an external retriever, yet maintains focus on crucial segments. We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text. Compared to baseline (no retrieval) and naive RAG pipelines, our approach more accurately handles multi-fact questions such as object location tracking, counting, and indefinite knowledge. Furthermore, we analyze how prompt structure, including the order of question, relevant-text tags, and overall instructions, significantly affects performance. These findings underscore that optimized prompt engineering, combined with guided reasoning, can enhance LLMs' long-context comprehension and serve as a lightweight alternative to traditional retrieval pipelines.

Paper Structure

This paper contains 37 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Pipeline of Our Proposed Method
  • Figure 2: Illustrative Prompt Construction for Single-Pass Retrieval Emulation + CoT