Large Language Models for Multi-Choice Question Classification of Medical Subjects
Víctor Ponce-López
TL;DR
The paper tackles automatic QA in the medical domain by turning MCQA subject classification into a multi-class problem and fine-tuning large language models for this task. It introduces the Multi-Question SequenceBERT framework, which uses Sentence-BERT embeddings to represent a set of questions and classify them into 21 medical subjects. On the MedMCQA dataset, the approach achieves a dev accuracy of 0.68 and a test accuracy of 0.60, outperforming prior state-of-the-art methods and demonstrating robust multi-class discrimination without context. This work highlights the potential of LLM-based multi-class classification to support automatic medical QA and lays groundwork for future integration into clinical decision-support workflows.
Abstract
The aim of this paper is to evaluate whether large language models trained on multi-choice question data can be used to discriminate between medical subjects. This is an important and challenging task for automatic question answering. To achieve this goal, we train deep neural networks for multi-class classification of questions into the inferred medical subjects. Using our Multi-Question (MQ) Sequence-BERT method, we outperform the state-of-the-art results on the MedMCQA dataset with an accuracy of 0.68 and 0.60 on their development and test sets, respectively. In this sense, we show the capability of AI and LLMs in particular for multi-classification tasks in the Healthcare domain.
