Table of Contents
Fetching ...

DCAT: Dual Cross-Attention Fusion for Disease Classification in Radiological Images with Uncertainty Estimation

Jutika Borah, Hidam Kumarjit Singh

TL;DR

This work addresses the challenge of reliable disease classification in radiology when faced with uncertain and heterogeneous imaging data. It introduces DCAT, a dual cross-attention fusion framework that jointly leverages EfficientNetB4 and ResNet34 by performing bidirectional cross-attention to fuse multi-scale features, followed by refined channel and spatial attention via an enhanced CBAM. The model incorporates MC-Dropout to quantify predictive uncertainty, reporting high performance across four datasets (Covid-19, TB, Pneumonia chest X-ray, and retinal OCT) while providing entropy-based uncertainty visualizations for interpretability. By combining hierarchical multi-scale fusion, attention-guided feature refinement, and principled uncertainty estimation, DCAT improves diagnostic reliability and supports clinically informed decision-making through interpretable uncertainty cues and visual explanations.

Abstract

Accurate and reliable image classification is crucial in radiology, where diagnostic decisions significantly impact patient outcomes. Conventional deep learning models tend to produce overconfident predictions despite underlying uncertainties, potentially leading to misdiagnoses. Attention mechanisms have emerged as powerful tools in deep learning, enabling models to focus on relevant parts of the input data. Combined with feature fusion, they can be effective in addressing uncertainty challenges. Cross-attention has become increasingly important in medical image analysis for capturing dependencies across features and modalities. This paper proposes a novel dual cross-attention fusion model for medical image analysis by addressing key challenges in feature integration and interpretability. Our approach introduces a bidirectional cross-attention mechanism with refined channel and spatial attention that dynamically fuses feature maps from EfficientNetB4 and ResNet34 leveraging multi-network contextual dependencies. The refined features through channel and spatial attention highlights discriminative patterns crucial for accurate classification. The proposed model achieved AUC of 99.75%, 100%, 99.93% and 98.69% and AUPR of 99.81%, 100%, 99.97%, and 96.36% on Covid-19, Tuberculosis, Pneumonia Chest X-ray images and Retinal OCT images respectively. The entropy values and several high uncertain samples give an interpretable visualization from the model enhancing transparency. By combining multi-scale feature extraction, bidirectional attention and uncertainty estimation, our proposed model strongly impacts medical image analysis.

DCAT: Dual Cross-Attention Fusion for Disease Classification in Radiological Images with Uncertainty Estimation

TL;DR

This work addresses the challenge of reliable disease classification in radiology when faced with uncertain and heterogeneous imaging data. It introduces DCAT, a dual cross-attention fusion framework that jointly leverages EfficientNetB4 and ResNet34 by performing bidirectional cross-attention to fuse multi-scale features, followed by refined channel and spatial attention via an enhanced CBAM. The model incorporates MC-Dropout to quantify predictive uncertainty, reporting high performance across four datasets (Covid-19, TB, Pneumonia chest X-ray, and retinal OCT) while providing entropy-based uncertainty visualizations for interpretability. By combining hierarchical multi-scale fusion, attention-guided feature refinement, and principled uncertainty estimation, DCAT improves diagnostic reliability and supports clinically informed decision-making through interpretable uncertainty cues and visual explanations.

Abstract

Accurate and reliable image classification is crucial in radiology, where diagnostic decisions significantly impact patient outcomes. Conventional deep learning models tend to produce overconfident predictions despite underlying uncertainties, potentially leading to misdiagnoses. Attention mechanisms have emerged as powerful tools in deep learning, enabling models to focus on relevant parts of the input data. Combined with feature fusion, they can be effective in addressing uncertainty challenges. Cross-attention has become increasingly important in medical image analysis for capturing dependencies across features and modalities. This paper proposes a novel dual cross-attention fusion model for medical image analysis by addressing key challenges in feature integration and interpretability. Our approach introduces a bidirectional cross-attention mechanism with refined channel and spatial attention that dynamically fuses feature maps from EfficientNetB4 and ResNet34 leveraging multi-network contextual dependencies. The refined features through channel and spatial attention highlights discriminative patterns crucial for accurate classification. The proposed model achieved AUC of 99.75%, 100%, 99.93% and 98.69% and AUPR of 99.81%, 100%, 99.97%, and 96.36% on Covid-19, Tuberculosis, Pneumonia Chest X-ray images and Retinal OCT images respectively. The entropy values and several high uncertain samples give an interpretable visualization from the model enhancing transparency. By combining multi-scale feature extraction, bidirectional attention and uncertainty estimation, our proposed model strongly impacts medical image analysis.

Paper Structure

This paper contains 28 sections, 27 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: (a) Random example images from the four datasets. From top Tuberculosis chest X-ray, Covid Chest X-ray, Pneumonia Chest X-ray, Retinal OCT images, (b) Visualization of class distribution for each dataset.
  • Figure 2: (a) t-SNE plots for visualization and comparison of distributions across datasets to identify similarities (overlap) and differences between (a) Covid-19 original dataset with four classes vs. Pneumonia Chest X-ray dataset, (b) TB vs. Pneumonia Chest X-ray dataset
  • Figure 3: Overview of the proposed DCAT fusion mechanism. The proposed mechanism starts with acquiring chest X-ray and OCT images. The proposed model comprised of local and global feature learning by EfficientNetB4 and ResNet34, cross-attention fusion, and predictive uncertainty estimation. Finally, we have uncertainty estimation quantifying the uncertainty of the model.
  • Figure 4: (a) Channel-attention and (b) spatial attention module as part of the CBAM used in this study. The proposed DCAT fuses feature maps from the two pretrained models before applying channel and spatial attention, enabling our proposed model to leverage information from the two networks and enhancing the feature representations for the classification task.
  • Figure 5: AUROC Curve and Precision-Recall Curve. From the top-left: Retinal OCT (a) AUROC Curve (b) Precision-Recall Curve, Covid-19 Chest X-ray (c) AUROC Curve (f) Precision-Recall Curve, Pneumonia Chest X-ray (d)AUROC Curve (g) Precision-Recall Curve, and Tuberculosis Chest X-ray (e) AUROC (h) Precision-Recall Curve
  • ...and 3 more figures