Table of Contents
Fetching ...

When Retrieval Succeeds and Fails: Rethinking Retrieval-Augmented Generation for LLMs

Yongjie Wang, Yue Yu, Kaisong Song, Jun Lin, Zhiqi Shen

TL;DR

This paper surveys Retrieval-Augmented Generation (RAG) as a paradigm to mitigate LLM hallucinations by injecting external knowledge. It outlines RAG's four-module architecture (indexing, retrieval, generation, orchestration), emphasizes the recall-precision trade-off, and discusses knowledge-graph-based indexing as a means to enhance cross-document reasoning. It identifies key challenges—adaptive retrieval triggers, content relevance for complex reasoning, data trustworthiness, and understanding how retrieval interacts with in-context learning—and proposes future directions such as agentic RAG and integration with long-context LLMs. The authors argue that, despite growing LLM capabilities, RAG remains valuable for knowledge-intensive, private, and real-time knowledge tasks, and advocate for next-generation systems that combine precise retrieval with robust reasoning.

Abstract

Large Language Models (LLMs) have enabled a wide range of applications through their powerful capabilities in language understanding and generation. However, as LLMs are trained on static corpora, they face difficulties in addressing rapidly evolving information or domain-specific queries. Retrieval-Augmented Generation (RAG) was developed to overcome this limitation by integrating LLMs with external retrieval mechanisms, allowing them to access up-to-date and contextually relevant knowledge. However, as LLMs themselves continue to advance in scale and capability, the relative advantages of traditional RAG frameworks have become less pronounced and necessary. Here, we present a comprehensive review of RAG, beginning with its overarching objectives and core components. We then analyze the key challenges within RAG, highlighting critical weakness that may limit its effectiveness. Finally, we showcase applications where LLMs alone perform inadequately, but where RAG, when combined with LLMs, can substantially enhance their effectiveness. We hope this work will encourage researchers to reconsider the role of RAG and inspire the development of next-generation RAG systems.

When Retrieval Succeeds and Fails: Rethinking Retrieval-Augmented Generation for LLMs

TL;DR

This paper surveys Retrieval-Augmented Generation (RAG) as a paradigm to mitigate LLM hallucinations by injecting external knowledge. It outlines RAG's four-module architecture (indexing, retrieval, generation, orchestration), emphasizes the recall-precision trade-off, and discusses knowledge-graph-based indexing as a means to enhance cross-document reasoning. It identifies key challenges—adaptive retrieval triggers, content relevance for complex reasoning, data trustworthiness, and understanding how retrieval interacts with in-context learning—and proposes future directions such as agentic RAG and integration with long-context LLMs. The authors argue that, despite growing LLM capabilities, RAG remains valuable for knowledge-intensive, private, and real-time knowledge tasks, and advocate for next-generation systems that combine precise retrieval with robust reasoning.

Abstract

Large Language Models (LLMs) have enabled a wide range of applications through their powerful capabilities in language understanding and generation. However, as LLMs are trained on static corpora, they face difficulties in addressing rapidly evolving information or domain-specific queries. Retrieval-Augmented Generation (RAG) was developed to overcome this limitation by integrating LLMs with external retrieval mechanisms, allowing them to access up-to-date and contextually relevant knowledge. However, as LLMs themselves continue to advance in scale and capability, the relative advantages of traditional RAG frameworks have become less pronounced and necessary. Here, we present a comprehensive review of RAG, beginning with its overarching objectives and core components. We then analyze the key challenges within RAG, highlighting critical weakness that may limit its effectiveness. Finally, we showcase applications where LLMs alone perform inadequately, but where RAG, when combined with LLMs, can substantially enhance their effectiveness. We hope this work will encourage researchers to reconsider the role of RAG and inspire the development of next-generation RAG systems.

Paper Structure

This paper contains 12 sections, 1 figure.

Figures (1)

  • Figure 1: The overall framework of RAG and its four core modules.