Table of Contents
Fetching ...

Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image Classification

Jialiang Fan, Xinhui Fan, Chengyan Song, Xiaofan Wang, Bingdong Feng, Lucan Li, Guoyu Lu

TL;DR

This work addresses the urgent need for rapid intracerebral hemorrhage (ICH) diagnosis from CT scans by introducing a real-world dataset annotated for normal vs. ICH and three hemorrhage-location classes. It proposes DTViT, a dual-task Vision Transformer that shares an encoder between two MLP decoders to simultaneously detect ICH and classify its location, trained with a balanced, augmented dataset and a dual-task loss. The model achieves near-perfect test performance (up to 0.999 accuracy on the test set; 0.996–0.999 for task 2) and outperforms classical CNN baselines, demonstrating the effectiveness of transformer-based dual-task architectures in medical imaging. This approach, along with the accompanying dataset, holds potential for speeding up clinical decision-making and improving treatment planning for ICH patients, especially in resource-limited settings.

Abstract

Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.

Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image Classification

TL;DR

This work addresses the urgent need for rapid intracerebral hemorrhage (ICH) diagnosis from CT scans by introducing a real-world dataset annotated for normal vs. ICH and three hemorrhage-location classes. It proposes DTViT, a dual-task Vision Transformer that shares an encoder between two MLP decoders to simultaneously detect ICH and classify its location, trained with a balanced, augmented dataset and a dual-task loss. The model achieves near-perfect test performance (up to 0.999 accuracy on the test set; 0.996–0.999 for task 2) and outperforms classical CNN baselines, demonstrating the effectiveness of transformer-based dual-task architectures in medical imaging. This approach, along with the accompanying dataset, holds potential for speeding up clinical decision-making and improving treatment planning for ICH patients, especially in resource-limited settings.

Abstract

Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.
Paper Structure (18 sections, 11 equations, 6 figures, 4 tables)

This paper contains 18 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Normal and brain hemorrhages in three different locations.
  • Figure 2: Noise images that have been removed from the dataset.
  • Figure 3: Morphological treatment of CT images.
  • Figure 4: The research diagram of the DTViT model.
  • Figure 5: The accuracy and loss curves in training and validating processes on datasets with and without augmentation. (a) Training and validating losses with augmentation. (b) Training and validating accuracies with augmentation. (c) Training and validating losses without augmentation. (d) Training and validating accuracies without augmentation.
  • ...and 1 more figures