Table of Contents
Fetching ...

LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems

Xiao Yu, Yunan Lu, Zhou Yu

TL;DR

LocalRQA presents an open-source, end-to-end toolkit for building retrieval-augmented QA systems, covering data generation, retriever and generator training, system assembly, automatic evaluation, and local deployment. It integrates diverse training algorithms and evaluation metrics drawn from recent RQA research, enabling researchers to customize pipelines and benchmark against remote baselines without relying on paid APIs. Experiments on Databricks and Faire document corpora show that 7B-parameter LocalRQA models can match the performance of OpenAI baselines, highlighting the practicality and cost-effectiveness of locally trained RQA systems. The modular design and local deployment capabilities facilitate rapid prototyping, human evaluation, and iterative refinement through RLHF-inspired feedback loops.

Abstract

Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.

LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems

TL;DR

LocalRQA presents an open-source, end-to-end toolkit for building retrieval-augmented QA systems, covering data generation, retriever and generator training, system assembly, automatic evaluation, and local deployment. It integrates diverse training algorithms and evaluation metrics drawn from recent RQA research, enabling researchers to customize pipelines and benchmark against remote baselines without relying on paid APIs. Experiments on Databricks and Faire document corpora show that 7B-parameter LocalRQA models can match the performance of OpenAI baselines, highlighting the practicality and cost-effectiveness of locally trained RQA systems. The modular design and local deployment capabilities facilitate rapid prototyping, human evaluation, and iterative refinement through RLHF-inspired feedback loops.

Abstract

Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.
Paper Structure (52 sections, 1 equation, 6 figures, 7 tables)

This paper contains 52 sections, 1 equation, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Given a collection of documents, LocalRQA provides tools to generate RQA data, to train and test open-source models, and to deploy the RQA system for human evaluation or as an interactive chatbot.
  • Figure 2: An overview of the LocalRQA toolkit, which supports the entire pipeline of developing an RQA system: from data processing to training, testing, and serving an RQA system. Different from many existing toolkits, we feature a wide selection of training, testing, and serving methods curated from the latest RQA research.
  • Figure 3: Assembling an RQA system.
  • Figure A1: Researchers can launch a human evaluation page using LocalRQA in a single command line. Given a prediction file (see \ref{['subsec:Evaluation']}), LocalRQA launches a web server that allows other users to evaluate the quality of pre-generated responses. Evaluation results are automatically saved for researchers to conduct further analysis.
  • Figure A2: Researchers can launch an interactive chat page with LocalRQA using three commands. LocalRQA uses a model controller back-end fastchat to handle load-balancing. Chat histories are automatically saved for researchers to conduct further analysis or model training.
  • ...and 1 more figures