CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation
Ziting Wang, Haitao Yuan, Wei Dong, Gao Cong, Feifei Li
TL;DR
CORAG addresses the core problems of Retrieval-Augmented Generation by modeling chunk selection as a cost-constrained optimization over chunk orders. It introduces a Monte Carlo Tree Search-based policy tree to capture inter-chunk correlations and non-monotonic utility, and couples this with a configuration agent that predicts domain-specific rerankers and MCTS settings via contrastive learning. The approach yields about 30% improvements in Rouge metrics over strong baselines and demonstrates scalability to large chunk collections with efficient retrieval. This work advances practical RAG by jointly optimizing retrieval, reranking, and prompt construction under explicit budget constraints, enabling more accurate and efficient information-grounded generation in diverse query domains.
Abstract
Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.
