Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization
Md Sultan Al Nahian, Ramakanth Kavuluru
TL;DR
The paper addresses extracting answers from clinical radiology notes by applying encoder-decoder transformers enhanced with Direct Preference Optimization (DPO) to RadQA. It demonstrates that encoder-decoder models outperform prior BERT-based baselines by over 10 F1 points, and that further DPO-based fine-tuning yields an additional 1–3 F1 gains, totaling 12–15 points over the previous state-of-the-art. A key contribution is automatically generating high-quality preference data (without human input) using model-based and rule-based strategies, and analyzing how factors like model size and negative-data diversity influence improvements. The work highlights DPO as an effective and computationally efficient alternative to RLHF for information-extraction tasks, with practical implications for improving radiology reading comprehension systems and potentially extending to other clinical NLP tasks.
Abstract
Extractive question answering over clinical text is a crucial need to help deal with the deluge of clinical text generated in hospitals. While encoder models (e.g., BERT) have been popular for this reading comprehension task, recently encoder-decoder models (e.g., T5) are on the rise. There is also the emergence of preference optimization techniques to align decoder-only LLMs with human preferences. In this paper, we combine encoder-decoder models with the direct preference optimization (DPO) method to improve over prior state of the art for the RadQA radiology question answering task by 12-15 F1 points. To the best of our knowledge, this effort is the first to show that DPO method also works for reading comprehension via novel heuristics to generate preference data without human inputs.
