Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning
Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa
TL;DR
This work tackles accurate chest X-ray diagnosis at very low resolutions by introducing MLCAK, a knowledge distillation framework that transfers multi-level self-attention knowledge from a high-resolution ViT teacher to a low-resolution ViT student in a multi-task setting. The framework optimizes two tasks—MLCT for local lesion identification and MCCT for global normal/abnormal classification—via collaborative KD, with the key transfer signal captured by the MLCAK term: $MLCAK = \frac{1}{N} \sum_{i=1}^{N} H_i$ and a joint loss $L_{joint} = L_{KD} + L_{Classification}$ where $L_{KD} = \alpha L_{mse}(MLCT) + \beta L_{mse}(MCCT) + \gamma L_{mse}(MLCAK)$. Evaluations on the Vindr CXR dataset show substantial AUROC gains for LR inputs down to $28\times28$, driven by improved attention localization and explainability, and consistent performance across ViT variants. This approach enables reliable, explainable CXR diagnosis in resource-constrained settings without requiring high-resolution imaging.
Abstract
This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptive fields are lost, hampering diagnosis accuracy. To address this, this paper presents the Multilevel Collaborative Attention Knowledge (MLCAK) method. This approach leverages the self-attention mechanism of Vision Transformers (ViT) to transfer critical diagnostic knowledge from high-resolution images to enhance the diagnostic efficacy of low-resolution CXRs. MLCAK incorporates local pathological findings to boost model explainability, enabling more accurate global predictions in a multi-task framework tailored for low-resolution CXR analysis. Our research, utilizing the Vindr CXR dataset, shows a considerable enhancement in the ability to diagnose diseases from low-resolution images (e.g. 28 x 28), suggesting a critical transition from the traditional reliance on high-resolution imaging (e.g. 224 x 224).
