Table of Contents
Fetching ...

RAG-BioQA: A Retrieval-Augmented Generation Framework for Long-Form Biomedical Question Answering

Lovely Yeswanth Panchumarthi, Sumalatha Saleti, Sai Prasad Gudari, Atharva Negi, Praveen Raj Budime, Harsit Upadhya

TL;DR

RAG-BioQA tackles the need for long-form, evidence-based biomedical QA by grounding a generative model in retrieved QA pairs. It combines BioBERT-based dense retrieval with FAISS indexing and LoRA-finetuned FLAN-T5 to generate comprehensive answers conditioned on retrieved contexts. The study shows domain-adapted dense retrieval outperforms zero-shot neural re-rankers and that fine-tuning yields substantial gains in semantic alignment (81% improvement in BERTScore). The open-source framework supports reproducible biomedical QA research and highlights the importance of domain-specific retrieval over general re-ranking for clinical QA.

Abstract

The rapidly growth of biomedical literature creates challenges acquiring specific medical information. Current biomedical question-answering systems primarily focus on short-form answers, failing to provide comprehensive explanations necessary for clinical decision-making. We present RAG-BioQA, a retrieval-augmented generation framework for long-form biomedical question answering. Our system integrates BioBERT embeddings with FAISS indexing for retrieval and a LoRA fine-tuned FLAN-T5 model for answer generation. We train on 181k QA pairs from PubMedQA, MedDialog, and MedQuAD, and evaluate on a held-out PubMedQA test set. We compare four retrieval strategies: dense retrieval (FAISS), BM25, ColBERT, and MonoT5. Our results show that domain-adapted dense retrieval outperforms zero-shot neural re-rankers, with the best configuration achieving 0.24 BLEU-1 and 0.29 ROUGE-1. Fine-tuning improves BERTScore by 81\% over the base model. We release our framework to support reproducible biomedical QA research.

RAG-BioQA: A Retrieval-Augmented Generation Framework for Long-Form Biomedical Question Answering

TL;DR

RAG-BioQA tackles the need for long-form, evidence-based biomedical QA by grounding a generative model in retrieved QA pairs. It combines BioBERT-based dense retrieval with FAISS indexing and LoRA-finetuned FLAN-T5 to generate comprehensive answers conditioned on retrieved contexts. The study shows domain-adapted dense retrieval outperforms zero-shot neural re-rankers and that fine-tuning yields substantial gains in semantic alignment (81% improvement in BERTScore). The open-source framework supports reproducible biomedical QA research and highlights the importance of domain-specific retrieval over general re-ranking for clinical QA.

Abstract

The rapidly growth of biomedical literature creates challenges acquiring specific medical information. Current biomedical question-answering systems primarily focus on short-form answers, failing to provide comprehensive explanations necessary for clinical decision-making. We present RAG-BioQA, a retrieval-augmented generation framework for long-form biomedical question answering. Our system integrates BioBERT embeddings with FAISS indexing for retrieval and a LoRA fine-tuned FLAN-T5 model for answer generation. We train on 181k QA pairs from PubMedQA, MedDialog, and MedQuAD, and evaluate on a held-out PubMedQA test set. We compare four retrieval strategies: dense retrieval (FAISS), BM25, ColBERT, and MonoT5. Our results show that domain-adapted dense retrieval outperforms zero-shot neural re-rankers, with the best configuration achieving 0.24 BLEU-1 and 0.29 ROUGE-1. Fine-tuning improves BERTScore by 81\% over the base model. We release our framework to support reproducible biomedical QA research.

Paper Structure

This paper contains 19 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The RAG-BioQA framework follows a clear process. The system retrieves contexts using BioBERT embeddings and FAISS indexing. We apply re-ranking strategies to select informative data. The system combines these contexts with our query. A fine-tuned T5 model generates comprehensive answers.