Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT
Jamie Deng, Yusen Wu, Yelena Yesha, Phuong Nguyen
TL;DR
The paper tackles automated identification of venous thromboembolism (VTE) from radiology reports, addressing limitations of prior complex pipelines that relied on hand-crafted rule sets. It proposes a Mamba-based classifier using a State Space Model foundation with long-context capability (8K tokens) and end-to-end fine-tuning, eliminating the need for rule-based classification and improving efficiency. Comparative experiments show the Mamba model achieving 97% accuracy and F1 on DVT and 98% accuracy and F1 on PE, outperforming or matching BERT baselines while handling longer texts; a lightweight Phi-3 Mini LLM with QLoRA also performs well but is more computationally demanding. The results support the practicality of long-context, rule-free NLP solutions for critical medical text classification, with potential for deployment in real-world clinical workflows.
Abstract
Venous thromboembolism (VTE) is a critical cardiovascular condition, encompassing deep vein thrombosis (DVT) and pulmonary embolism (PE). Accurate and timely identification of VTE is essential for effective medical care. This study builds upon our previous work, which addressed VTE detection using deep learning methods for DVT and a hybrid approach combining deep learning and rule-based classification for PE. Our earlier approaches, while effective, had two major limitations: they were complex and required expert involvement for feature engineering of the rule set. To overcome these challenges, we utilize the Mamba architecture-based classifier. This model achieves remarkable results, with a 97\% accuracy and F1 score on the DVT dataset and a 98\% accuracy and F1 score on the PE dataset. In contrast to the previous hybrid method on PE identification, the Mamba classifier eliminates the need for hand-engineered rules, significantly reducing model complexity while maintaining comparable performance. Additionally, we evaluated a lightweight Large Language Model (LLM), Phi-3 Mini, in detecting VTE. While this model delivers competitive results, outperforming the baseline BERT models, it proves to be computationally intensive due to its larger parameter set. Our evaluation shows that the Mamba-based model demonstrates superior performance and efficiency in VTE identification, offering an effective solution to the limitations of previous approaches.
