LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
Xiao Yu, Yunan Lu, Zhou Yu
TL;DR
LocalRQA presents an open-source, end-to-end toolkit for building retrieval-augmented QA systems, covering data generation, retriever and generator training, system assembly, automatic evaluation, and local deployment. It integrates diverse training algorithms and evaluation metrics drawn from recent RQA research, enabling researchers to customize pipelines and benchmark against remote baselines without relying on paid APIs. Experiments on Databricks and Faire document corpora show that 7B-parameter LocalRQA models can match the performance of OpenAI baselines, highlighting the practicality and cost-effectiveness of locally trained RQA systems. The modular design and local deployment capabilities facilitate rapid prototyping, human evaluation, and iterative refinement through RLHF-inspired feedback loops.
Abstract
Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.
