Table of Contents
Fetching ...

BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A

Samy Ateia, Udo Kruschwitz

TL;DR

BioRAGent addresses the need for grounded, transparent biomedical question answering in professional search. It introduces a retrieval-augmented generation pipeline that combines few-shot query expansion, snippet extraction, and answer generation with explicit PubMed citations, delivered via a Gradio web interface. The system grounds LLM outputs in PubMed data using Elasticsearch-based BM25 retrieval and a multi-step snippet processing, achieving competitive results in the BioASQ 2024 challenge while preserving transparency through direct source links. The work demonstrates that simple in-context learning augmented with RAG can support domain-specific search tasks and guides future work on interface expansion, evaluation modules, and multi-LLM support. The practical impact lies in enabling biomedical researchers to obtain evidence-based answers with inspectable sources and editable queries.

Abstract

We present BioRAGent, an interactive web-based retrieval-augmented generation (RAG) system for biomedical question answering. The system uses large language models (LLMs) for query expansion, snippet extraction, and answer generation while maintaining transparency through citation links to the source documents and displaying generated queries for further editing. Building on our successful participation in the BioASQ 2024 challenge, we demonstrate how few-shot learning with LLMs can be effectively applied for a professional search setting. The system supports both direct short paragraph style responses and responses with inline citations. Our demo is available online, and the source code is publicly accessible through GitHub.

BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A

TL;DR

BioRAGent addresses the need for grounded, transparent biomedical question answering in professional search. It introduces a retrieval-augmented generation pipeline that combines few-shot query expansion, snippet extraction, and answer generation with explicit PubMed citations, delivered via a Gradio web interface. The system grounds LLM outputs in PubMed data using Elasticsearch-based BM25 retrieval and a multi-step snippet processing, achieving competitive results in the BioASQ 2024 challenge while preserving transparency through direct source links. The work demonstrates that simple in-context learning augmented with RAG can support domain-specific search tasks and guides future work on interface expansion, evaluation modules, and multi-LLM support. The practical impact lies in enabling biomedical researchers to obtain evidence-based answers with inspectable sources and editable queries.

Abstract

We present BioRAGent, an interactive web-based retrieval-augmented generation (RAG) system for biomedical question answering. The system uses large language models (LLMs) for query expansion, snippet extraction, and answer generation while maintaining transparency through citation links to the source documents and displaying generated queries for further editing. Building on our successful participation in the BioASQ 2024 challenge, we demonstrate how few-shot learning with LLMs can be effectively applied for a professional search setting. The system supports both direct short paragraph style responses and responses with inline citations. Our demo is available online, and the source code is publicly accessible through GitHub.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: Screenshot of part of the BioRAGent interface, showcasing query expansion