Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image Classification
Jialiang Fan, Xinhui Fan, Chengyan Song, Xiaofan Wang, Bingdong Feng, Lucan Li, Guoyu Lu
TL;DR
This work addresses the urgent need for rapid intracerebral hemorrhage (ICH) diagnosis from CT scans by introducing a real-world dataset annotated for normal vs. ICH and three hemorrhage-location classes. It proposes DTViT, a dual-task Vision Transformer that shares an encoder between two MLP decoders to simultaneously detect ICH and classify its location, trained with a balanced, augmented dataset and a dual-task loss. The model achieves near-perfect test performance (up to 0.999 accuracy on the test set; 0.996–0.999 for task 2) and outperforms classical CNN baselines, demonstrating the effectiveness of transformer-based dual-task architectures in medical imaging. This approach, along with the accompanying dataset, holds potential for speeding up clinical decision-making and improving treatment planning for ICH patients, especially in resource-limited settings.
Abstract
Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.
