Tuning LLMs by RAG Principles: Towards LLM-native Memory

Jiale Wei; Shuchi Wu; Ruochen Liu; Xiang Ying; Jingbo Shang; Fangbo Tao

Tuning LLMs by RAG Principles: Towards LLM-native Memory

Jiale Wei, Shuchi Wu, Ruochen Liu, Xiang Ying, Jingbo Shang, Fangbo Tao

TL;DR

This work investigates memory in LLMs by systematically comparing long-context and retrieval-augmented approaches, finding that long-context models excel at global reasoning while RAG excels at local, retrieval-driven queries. Building on these insights, the authors introduce RAG-Tuned-LLM, a method that synthesizes training data from GraphRAG using entity- and relationship-based templates, then fine-tunes a relatively small LLM via LoRA to encode memory directly in model parameters. Across three datasets, RAG-Tuned-LLM outperforms both standard RAG and long-context baselines on local and global queries, demonstrating strong memory capabilities without external retrieval during inference. The approach bridges open-domain and domain-specific QA, offering a memory-efficient, scalable path for LLM-native reasoning with hierarchical knowledge, and points to future extensions across domains and modalities.

Abstract

Memory, additional information beyond the training of large language models (LLMs), is crucial to various real-world applications, such as personal assistant. The two mainstream solutions to incorporate memory into the generation process are long-context LLMs and retrieval-augmented generation (RAG). In this paper, we first systematically compare these two types of solutions on three renovated/new datasets and show that (1) long-context solutions, although more expensive, shall be easier to capture the big picture and better answer queries which require considering the memory as a whole; and (2) when the queries concern specific information, RAG solutions shall be more competitive especially when the keywords can be explicitly matched. Therefore, we propose a novel method RAG-Tuned-LLM which fine-tunes a relative small (e.g., 7B) LLM using the data generated following the RAG principles, so it can combine the advantages of both solutions. Extensive experiments on three datasets demonstrate that RAG-Tuned-LLM can beat long-context LLMs and RAG methods across a wide range of query types.

Tuning LLMs by RAG Principles: Towards LLM-native Memory

TL;DR

Abstract

Tuning LLMs by RAG Principles: Towards LLM-native Memory

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)