Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs
Xiaqiang Tang, Jian Li, Nan Du, Sihong Xie
TL;DR
This work introduces a Multi-Arm Bandit enhanced Retrieval-Augmented Generation framework for knowledge-graph-based QA, addressing non-stationary real-world environments by dynamically selecting among multiple retrieval methods using real-time feedback. It combines a DistilBERT-based query encoder, an epsilon-greedy arm selector, offline-to-online learning, and a Generalized Gini Index to balance multi-objective rewards such as accuracy and retrieval latency. Across two KBQA datasets, the proposed GGIMAB approach outperforms baselines in non-stationary settings and achieves state-of-the-art performance in stationary settings, demonstrating strong adaptability to backend upgrades and domain shifts. The results highlight the practical value of continuously adapting retrieval strategies in RAG systems to maintain informative and timely responses in dynamic environments.
Abstract
Despite the superior performance of Large language models on many NLP tasks, they still face significant limitations in memorizing extensive world knowledge. Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation (RAG) framework, combined with Knowledge Graphs that encapsulate extensive factual data in a structured format, robustly enhances the reasoning capabilities of LLMs. However, deploying such systems in real-world scenarios presents challenges: the continuous evolution of non-stationary environments may lead to performance degradation and user satisfaction requires a careful balance of performance and responsiveness. To address these challenges, we introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities under rich and evolving retrieval contexts in practice. Within this framework, each retrieval method is treated as a distinct ``arm''. The system utilizes real-time user feedback to adapt to dynamic environments, by selecting the appropriate retrieval method based on input queries and the historical multi-objective performance of each arm. Extensive experiments conducted on two benchmark KGQA datasets demonstrate that our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments. Code and data are available at https://github.com/FUTUREEEEEE/Dynamic-RAG.git
