Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization

Priyaranjan Pattnayak; Hitesh Laxmichand Patel; Amit Agarwal; Bhargava Kumar; Srikant Panda; Tejaswini Kumar

Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization

Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Amit Agarwal, Bhargava Kumar, Srikant Panda, Tejaswini Kumar

TL;DR

This work tackles the need for clinical question answering outputs that are both accurate and structurally informative within EMRs. It introduces a multi-task learning framework that jointly trains answer extraction and medical categorization, using a ClinicalBERT backbone with a span-prediction head and a medical categorization head over five categories, aided by UMLS/SciSpacy for labeling and soft-labeling for ambiguity. On the emrQA dataset, the method yields improvements over single-task baselines, notably a 2.2 percentage point gain in QA F1 and a 6.2 percentage point gain in classification accuracy, with ClinicalBERT outperforming BioBERT in both tasks. The approach enables structured retrieval, better decision support, and potential integration with clinical decision support and coding workflows, while identifying avenues for future work such as retrieval-augmented methods and ontology-enhanced classification.

Abstract

Clinical Question Answering (CQA) plays a crucial role in medical decision-making, enabling physicians to extract relevant information from Electronic Medical Records (EMRs). While transformer-based models such as BERT, BioBERT, and ClinicalBERT have demonstrated state-of-the-art performance in CQA, existing models lack the ability to categorize extracted answers, which is critical for structured retrieval, content filtering, and medical decision support. To address this limitation, we introduce a Multi-Task Learning (MTL) framework that jointly trains CQA models for both answer extraction and medical categorization. In addition to predicting answer spans, our model classifies responses into five standardized medical categories: Diagnosis, Medication, Symptoms, Procedure, and Lab Reports. This categorization enables more structured and interpretable outputs, making clinical QA models more useful in real-world healthcare settings. We evaluate our approach on emrQA, a large-scale dataset for medical question answering. Results show that MTL improves F1-score by 2.2% compared to standard fine-tuning, while achieving 90.7% accuracy in answer categorization. These findings suggest that MTL not only enhances CQA performance but also introduces an effective mechanism for categorization and structured medical information retrieval.

Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization

TL;DR

Abstract

Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)