Table of Contents
Fetching ...

MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation

Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang

TL;DR

MiniRAG tackles the challenge of performing retrieval-augmented generation with small language models on resource-constrained devices. It introduces a semantic-aware heterogeneous graph that unites text chunks and named entities, paired with a topology-enhanced retrieval strategy that leverages graph structure to reduce reliance on deep semantic understanding. Across on-device benchmarks, MiniRAG achieves 1.3 to 2.5 times higher effectiveness than existing lightweight baselines while using only about 25% of the storage, and it maintains robustness when transitioning from LLMs to SLMs. The authors also release LiHuaWorld, a realistic on-device RAG benchmark, and provide fully open-source code and datasets, advancing practical edge AI with private, efficient knowledge retrieval and generation.

Abstract

The growing demand for efficient and lightweight Retrieval-Augmented Generation (RAG) systems has highlighted significant challenges when deploying Small Language Models (SLMs) in existing RAG frameworks. Current approaches face severe performance degradation due to SLMs' limited semantic understanding and text processing capabilities, creating barriers for widespread adoption in resource-constrained scenarios. To address these fundamental limitations, we present MiniRAG, a novel RAG system designed for extreme simplicity and efficiency. MiniRAG introduces two key technical innovations: (1) a semantic-aware heterogeneous graph indexing mechanism that combines text chunks and named entities in a unified structure, reducing reliance on complex semantic understanding, and (2) a lightweight topology-enhanced retrieval approach that leverages graph structures for efficient knowledge discovery without requiring advanced language capabilities. Our extensive experiments demonstrate that MiniRAG achieves comparable performance to LLM-based methods even when using SLMs while requiring only 25\% of the storage space. Additionally, we contribute a comprehensive benchmark dataset for evaluating lightweight RAG systems under realistic on-device scenarios with complex queries. We fully open-source our implementation and datasets at: https://github.com/HKUDS/MiniRAG.

MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation

TL;DR

MiniRAG tackles the challenge of performing retrieval-augmented generation with small language models on resource-constrained devices. It introduces a semantic-aware heterogeneous graph that unites text chunks and named entities, paired with a topology-enhanced retrieval strategy that leverages graph structure to reduce reliance on deep semantic understanding. Across on-device benchmarks, MiniRAG achieves 1.3 to 2.5 times higher effectiveness than existing lightweight baselines while using only about 25% of the storage, and it maintains robustness when transitioning from LLMs to SLMs. The authors also release LiHuaWorld, a realistic on-device RAG benchmark, and provide fully open-source code and datasets, advancing practical edge AI with private, efficient knowledge retrieval and generation.

Abstract

The growing demand for efficient and lightweight Retrieval-Augmented Generation (RAG) systems has highlighted significant challenges when deploying Small Language Models (SLMs) in existing RAG frameworks. Current approaches face severe performance degradation due to SLMs' limited semantic understanding and text processing capabilities, creating barriers for widespread adoption in resource-constrained scenarios. To address these fundamental limitations, we present MiniRAG, a novel RAG system designed for extreme simplicity and efficiency. MiniRAG introduces two key technical innovations: (1) a semantic-aware heterogeneous graph indexing mechanism that combines text chunks and named entities in a unified structure, reducing reliance on complex semantic understanding, and (2) a lightweight topology-enhanced retrieval approach that leverages graph structures for efficient knowledge discovery without requiring advanced language capabilities. Our extensive experiments demonstrate that MiniRAG achieves comparable performance to LLM-based methods even when using SLMs while requiring only 25\% of the storage space. Additionally, we contribute a comprehensive benchmark dataset for evaluating lightweight RAG systems under realistic on-device scenarios with complex queries. We fully open-source our implementation and datasets at: https://github.com/HKUDS/MiniRAG.
Paper Structure (13 sections, 1 equation, 4 figures, 3 tables)

This paper contains 13 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The MiniRAG employs a streamlined workflow built on the key components: heterogeneous graph indexing and lightweight graph-based knowledge retrieval. This architecture addresses the unique challenges faced by on-device RAG systems, optimizing for both efficiency and effectiveness.
  • Figure 2: Compared to Large Language Models (LLMs), Small Language Models (SLMs) show significant limitations in both indexing and answering phases. Left: SLMs generate notably lower-quality descriptions than LLMs. Right: When processing identical inputs, SLMs struggle to locate relevant information in large contexts, while LLMs perform this task effectively.
  • Figure 3: Accuracy vs. Storage Efficiency: Comparative analysis of three RAG systems - MiniRAG, LightRAG, and GraphRAG.
  • Figure A1: LiHuaWorld simulates a digitally interconnected world where AI agents communicate through mobile chat applications. Through the lens of our primary subject, Li Hua, we observe and collect authentic chat interactions within this virtual social ecosystem.