BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Mingchen Li; Halil Kilicoglu; Hua Xu; Rui Zhang

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Mingchen Li, Halil Kilicoglu, Hua Xu, Rui Zhang

TL;DR

BiomedRAG tackles hallucination and knowledge gaps in biomedical LLMs by introducing a retrieval-augmented framework that feeds retrieved chunk-based documents directly into the LM. A learnable tailored chunk scorer guides the retrieval from a diverse chunk database, and an information extractor combines the input with the best retrieved chunk to produce task outputs. Across five biomedical NLP tasks (triple extraction, relation extraction, text classification, link prediction) and nine datasets, BiomedRAG yields state-of-the-art results and consistently outperforms RA-KNN-style baselines, while analyses reveal the critical roles of chunk diversity and LM-guided scoring. The approach improves robustness to noise in biomedical texts and offers practical benefits for biomedical knowledge discovery, with open-source code and data to support future work.

Abstract

Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented LMs, which utilize specialized cross-attention mechanisms to help LLM encode retrieved text, BiomedRAG adopts a simpler approach by directly inputting the retrieved chunk-based documents into the LLM. This straightforward design is easily applicable to existing retrieval and language models, effectively bypassing noise information in retrieved documents, particularly in noise-intensive tasks. Moreover, we demonstrate the potential for utilizing the LLM to supervise the retrieval model in the biomedical domain, enabling it to retrieve the document that assists the LM in improving its predictions. Our experiments reveal that with the tuned scorer,\textsc{ BiomedRAG} attains superior performance across 5 biomedical NLP tasks, encompassing information extraction (triple extraction, relation extraction), text classification, link prediction, and question-answering, leveraging over 9 datasets. For instance, in the triple extraction task, \textsc{BiomedRAG} outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

TL;DR

Abstract

Paper Structure (36 sections, 11 equations, 4 figures, 6 tables)

This paper contains 36 sections, 11 equations, 4 figures, 6 tables.

Introduction
Results
Comparative Assessments between Our biomedRAG Framework with Other Models
Triple Extraction
Relation Extraction
Text Classification
Link Prediction
Comparative Assessments with RAG models
Model Performance under Different Chunk Sizes
Module (tailored chunk scorer, diversity operation) assessment
Discussion
Method
Datasets
Triple Extraction Dataset
Relation Extraction
...and 21 more sections

Figures (4)

Figure 1: F1(a-h)/Accuracy(i) performance of different models. BR refers to biomedRAG. The red font indicates the performance of biomedRAG.
Figure 2: Different F1 (a-h)/Accuracy (i) performance of KNN-based retrieval model on 5 tasks over 9 datasets. We select the LLM with the highest baseline performance as our primary LLM model. The number of document samples depends on the input length constraint of the LLM. For instance, in dataset Mssample, when $top-n>2$, the maximum input token length of 8318 exceeds the maximum input token length of 8192 tokens for GPT-4. y-axis: F1 (a-h)/Accuracy (i). x-axis: 0 represents no retriever, while 1-30 represents the top-n documents retrieved. The red font refers to the best performance.
Figure 3: (a): Precision (P), Recall (R), F1 results with different chunk length $m$ settings in the task of biomedical triple extraction task. (b): Precision (P), Recall (R), F1 results with different chunk length $m$ settings in the task of biomedical relation extraction task.
Figure 4: Overview of BiomedRAG.

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

TL;DR

Abstract

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Authors

TL;DR

Abstract

Table of Contents

Figures (4)