Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

Yasmeena Akhter; Rishabh Ranjan; Richa Singh; Mayank Vatsa

Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

TL;DR

This work tackles accurate chest X-ray diagnosis at very low resolutions by introducing MLCAK, a knowledge distillation framework that transfers multi-level self-attention knowledge from a high-resolution ViT teacher to a low-resolution ViT student in a multi-task setting. The framework optimizes two tasks—MLCT for local lesion identification and MCCT for global normal/abnormal classification—via collaborative KD, with the key transfer signal captured by the MLCAK term: $MLCAK = \frac{1}{N} \sum_{i=1}^{N} H_i$ and a joint loss $L_{joint} = L_{KD} + L_{Classification}$ where $L_{KD} = \alpha L_{mse}(MLCT) + \beta L_{mse}(MCCT) + \gamma L_{mse}(MLCAK)$. Evaluations on the Vindr CXR dataset show substantial AUROC gains for LR inputs down to $28\times28$, driven by improved attention localization and explainability, and consistent performance across ViT variants. This approach enables reliable, explainable CXR diagnosis in resource-constrained settings without requiring high-resolution imaging.

Abstract

This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptive fields are lost, hampering diagnosis accuracy. To address this, this paper presents the Multilevel Collaborative Attention Knowledge (MLCAK) method. This approach leverages the self-attention mechanism of Vision Transformers (ViT) to transfer critical diagnostic knowledge from high-resolution images to enhance the diagnostic efficacy of low-resolution CXRs. MLCAK incorporates local pathological findings to boost model explainability, enabling more accurate global predictions in a multi-task framework tailored for low-resolution CXR analysis. Our research, utilizing the Vindr CXR dataset, shows a considerable enhancement in the ability to diagnose diseases from low-resolution images (e.g. 28 x 28), suggesting a critical transition from the traditional reliance on high-resolution imaging (e.g. 224 x 224).

Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

TL;DR

and a joint loss

where

. Evaluations on the Vindr CXR dataset show substantial AUROC gains for LR inputs down to

, driven by improved attention localization and explainability, and consistent performance across ViT variants. This approach enables reliable, explainable CXR diagnosis in resource-constrained settings without requiring high-resolution imaging.

Abstract

Paper Structure (8 sections, 5 equations, 3 figures, 2 tables)

This paper contains 8 sections, 5 equations, 3 figures, 2 tables.

Introduction
Related Work
Proposed MLCAK Framework
Experimental Setup
Results and Analysis
Conclusion and Discussion
Compliance with Ethical Standards
Acknowledgement

Figures (3)

Figure 1: Illustrating visual differences in the CXR samples. (A): HR sample of $224 \times 224$ resolution. (B)-(D): Corresponding LR samples of resolution $112 \times 112$, $56 \times 56$, $28 \times 28$ respectively. Downsizing leads to loss of spatial information, resulting in poor diagnostic performance.
Figure 2: Showcases the overall proposed KD approach. It takes two inputs simultaneously, where T takes HR and S takes its corresponding LR CXR and generates two outputs in the MTL setup. $L_{MLCAK}$, $L_{MCCT}$ and $L_{MLCT}$ represent the three individual losses for Collaborative Knowledge Distillation.
Figure 3: Illustrates the visual difference in the attention generated by the ViT Base model.(A) Original input with finding. (B), (C), (D) are proposed MLCAK attention maps for the student model with resolution $112 \times 112$, $56 \times 56$ and $28 \times 28$ respectively. (E) represent attention map from the Teacher model with $224 \times 224$ resolution and (F)-(H)baseline student model with resolution $112 \times 112$, $56 \times 56$, $28 \times 28$ respectively.

Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

TL;DR

Abstract

Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)