Efficient Medical Question Answering with Knowledge-Augmented Question Generation

Julien Khlaut; Corentin Dancette; Elodie Ferreres; Alaedine Bennani; Paul Hérent; Pierre Manceron

Efficient Medical Question Answering with Knowledge-Augmented Question Generation

Julien Khlaut, Corentin Dancette, Elodie Ferreres, Alaedine Bennani, Paul Hérent, Pierre Manceron

TL;DR

The paper tackles the limited medical QA performance of small language models by combining textbook-based pre-training with GPT-4–generated, exam-style clinical cases and a new dataset, ECN-QA, featuring both independent and progressive questions. The authors demonstrate that starting from BioMedLM and incorporating book pre-training plus GPT-4–augmented data yields meaningful gains, ultimately surpassing GPT-3.5 on ECN-QA while remaining below GPT-4. The introduction of progressive questions in ECN-QA and the proposition-level fine-tuning strategy provide a practical path to robust, efficient medical QA for resource-constrained models. The work also discusses ethical considerations and suggests avenues for improvement, such as larger pre-training corpora and retrieval-based approaches to further close the gap with large LLMs in clinical contexts.

Abstract

In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. Additionally, we introduce ECN-QA, a novel medical question answering dataset containing ``progressive questions'' composed of related sequential questions. We show the benefits of our training strategy on this dataset. The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned. The code and weights are available at https://github.com/raidium-med/MQG.

Efficient Medical Question Answering with Knowledge-Augmented Question Generation

TL;DR

Abstract

Paper Structure (25 sections, 3 figures, 5 tables)

This paper contains 25 sections, 3 figures, 5 tables.

Introduction
Datasets
ECN-QA Dataset
Medical Textbooks
Method
Baseline Model
Questions Generation
Pre-training
Fine-tuning
Results
Evaluation of GPT models
Main Results
Conclusion
Ethical Concerns
Acknowledgement
...and 10 more sections

Figures (3)

Figure 1: Our training strategy. Starting from an existing language model such as BioMedLM, we continue the pre-training on our corpus of medical textbooks. Then, we use GPT-4, prompted with knowledge from the textbooks, to generate clinical cases that are used to fine-tune the model.
Figure 2: Accuracy distribution by question (number of correct propositions divided by number of total propositions) on the FreeCN dataset of GPT-4 and BioMedLM + Books + MQG
Figure 3: Accuracy per subject of BioMedLM and GPT-4

Efficient Medical Question Answering with Knowledge-Augmented Question Generation

TL;DR

Abstract

Efficient Medical Question Answering with Knowledge-Augmented Question Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)