RAG-BioQA: A Retrieval-Augmented Generation Framework for Long-Form Biomedical Question Answering
Lovely Yeswanth Panchumarthi, Sumalatha Saleti, Sai Prasad Gudari, Atharva Negi, Praveen Raj Budime, Harsit Upadhya
TL;DR
RAG-BioQA tackles the need for long-form, evidence-based biomedical QA by grounding a generative model in retrieved QA pairs. It combines BioBERT-based dense retrieval with FAISS indexing and LoRA-finetuned FLAN-T5 to generate comprehensive answers conditioned on retrieved contexts. The study shows domain-adapted dense retrieval outperforms zero-shot neural re-rankers and that fine-tuning yields substantial gains in semantic alignment (81% improvement in BERTScore). The open-source framework supports reproducible biomedical QA research and highlights the importance of domain-specific retrieval over general re-ranking for clinical QA.
Abstract
The rapidly growth of biomedical literature creates challenges acquiring specific medical information. Current biomedical question-answering systems primarily focus on short-form answers, failing to provide comprehensive explanations necessary for clinical decision-making. We present RAG-BioQA, a retrieval-augmented generation framework for long-form biomedical question answering. Our system integrates BioBERT embeddings with FAISS indexing for retrieval and a LoRA fine-tuned FLAN-T5 model for answer generation. We train on 181k QA pairs from PubMedQA, MedDialog, and MedQuAD, and evaluate on a held-out PubMedQA test set. We compare four retrieval strategies: dense retrieval (FAISS), BM25, ColBERT, and MonoT5. Our results show that domain-adapted dense retrieval outperforms zero-shot neural re-rankers, with the best configuration achieving 0.24 BLEU-1 and 0.29 ROUGE-1. Fine-tuning improves BERTScore by 81\% over the base model. We release our framework to support reproducible biomedical QA research.
