Table of Contents
Fetching ...

Lung Infection Severity Prediction Using Transformers with Conditional TransMix Augmentation and Cross-Attention

Bouthaina Slika, Fadi Dornaika, Fares Bougourzi, Karim Hammoudi

TL;DR

This work tackles the pneumonia severity prediction problem from chest X-rays and CT scans by introducing QCross-Att-PVT, a parallel-transformer architecture that uses four quadrant encoders, a cross-gated attention mechanism, and a ViT-based feature aggregator to produce a robust scalar severity score. To address data imbalance, the authors propose Conditional Online TransMix, a patch-based augmentation guided by attention maps that balances low-representation severity levels. Across RALO CXR and Per-COVID-19 CT datasets, the method achieves state-of-the-art performance with low mean absolute error and high correlation to expert scores, and extensive ablation studies validate the contributions of the TransMix augmentation, gated cross-attention, and the four-quadrant design. The approach demonstrates strong cross-modality generalization and interpretability via attention maps, offering a promising, adaptable tool for clinical decision support in pneumonia severity assessment.

Abstract

Lung infections, particularly pneumonia, pose serious health risks that can escalate rapidly, especially during pandemics. Accurate AI-based severity prediction from medical imaging is essential to support timely clinical decisions and optimize patient outcomes. In this work, we present a novel method applicable to both CT scans and chest X-rays for assessing lung infection severity. Our contributions are twofold: (i) QCross-Att-PVT, a Transformer-based architecture that integrates parallel encoders, a cross-gated attention mechanism, and a feature aggregator to capture rich multi-scale features; and (ii) Conditional Online TransMix, a custom data augmentation strategy designed to address dataset imbalance by generating mixed-label image patches during training. Evaluated on two benchmark datasets, RALO CXR and Per-COVID-19 CT, our method consistently outperforms several state-of-the-art deep learning models. The results emphasize the critical role of data augmentation and gated attention in improving both robustness and predictive accuracy. This approach offers a reliable, adaptable tool to support clinical diagnosis, disease monitoring, and personalized treatment planning. The source code of this work is available at https://github.com/bouthainas/QCross-Att-PVT.

Lung Infection Severity Prediction Using Transformers with Conditional TransMix Augmentation and Cross-Attention

TL;DR

This work tackles the pneumonia severity prediction problem from chest X-rays and CT scans by introducing QCross-Att-PVT, a parallel-transformer architecture that uses four quadrant encoders, a cross-gated attention mechanism, and a ViT-based feature aggregator to produce a robust scalar severity score. To address data imbalance, the authors propose Conditional Online TransMix, a patch-based augmentation guided by attention maps that balances low-representation severity levels. Across RALO CXR and Per-COVID-19 CT datasets, the method achieves state-of-the-art performance with low mean absolute error and high correlation to expert scores, and extensive ablation studies validate the contributions of the TransMix augmentation, gated cross-attention, and the four-quadrant design. The approach demonstrates strong cross-modality generalization and interpretability via attention maps, offering a promising, adaptable tool for clinical decision support in pneumonia severity assessment.

Abstract

Lung infections, particularly pneumonia, pose serious health risks that can escalate rapidly, especially during pandemics. Accurate AI-based severity prediction from medical imaging is essential to support timely clinical decisions and optimize patient outcomes. In this work, we present a novel method applicable to both CT scans and chest X-rays for assessing lung infection severity. Our contributions are twofold: (i) QCross-Att-PVT, a Transformer-based architecture that integrates parallel encoders, a cross-gated attention mechanism, and a feature aggregator to capture rich multi-scale features; and (ii) Conditional Online TransMix, a custom data augmentation strategy designed to address dataset imbalance by generating mixed-label image patches during training. Evaluated on two benchmark datasets, RALO CXR and Per-COVID-19 CT, our method consistently outperforms several state-of-the-art deep learning models. The results emphasize the critical role of data augmentation and gated attention in improving both robustness and predictive accuracy. This approach offers a reliable, adaptable tool to support clinical diagnosis, disease monitoring, and personalized treatment planning. The source code of this work is available at https://github.com/bouthainas/QCross-Att-PVT.

Paper Structure

This paper contains 18 sections, 7 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Illustration of the proposed QCross-Att-PVT model.
  • Figure 2: Gated Attention.
  • Figure 3: TransMix applied to CT images.
  • Figure 4: RALO training set scores distribution.
  • Figure 5: Per-COVID-19 training set with CIP score distribution.
  • ...and 3 more figures