RRAML: Reinforced Retrieval Augmented Machine Learning
Andrea Bacciu, Florin Cuconasu, Federico Siciliano, Fabrizio Silvestri, Nicola Tonellotto, Giovanni Trappolini
TL;DR
This work introduces RRAML, a retrieval-augmented framework that couples a Generative Language Model, a Retriever, and a Reasoner, with reinforcement learning to align the retriever and prompt generation to the final task. It avoids fine-tuning or gradients on the Reasoner by leveraging a reward-driven loop (e.g., PPO or DQN) that updates the retriever and prompt generator based on task outcomes and, optionally, human feedback. By tightly coupling retrieval with reasoning, RRAML mitigates hallucinations and reduces exposure to harmful documents, while enabling use over arbitrarily large databases. The approach aims to democratize access to powerful LLM reasoning, especially for users lacking the resources to fine-tune models or maintain extensive internal infrastructures. A concrete use-case illustrates how a private data repository can be leveraged effectively within context constraints to produce accurate, task-focused responses.
Abstract
The emergence of large language models (LLMs) has revolutionized machine learning and related fields, showcasing remarkable abilities in comprehending, generating, and manipulating human language. However, their conventional usage through API-based text prompt submissions imposes certain limitations in terms of context constraints and external source availability. To address these challenges, we propose a novel framework called Reinforced Retrieval Augmented Machine Learning (RRAML). RRAML integrates the reasoning capabilities of LLMs with supporting information retrieved by a purpose-built retriever from a vast user-provided database. By leveraging recent advancements in reinforcement learning, our method effectively addresses several critical challenges. Firstly, it circumvents the need for accessing LLM gradients. Secondly, our method alleviates the burden of retraining LLMs for specific tasks, as it is often impractical or impossible due to restricted access to the model and the computational intensity involved. Additionally we seamlessly link the retriever's task with the reasoner, mitigating hallucinations and reducing irrelevant, and potentially damaging retrieved documents. We believe that the research agenda outlined in this paper has the potential to profoundly impact the field of AI, democratizing access to and utilization of LLMs for a wide range of entities.
