Impact of domain adaptation in deep learning for medical image classifications
Yihang Wu, Ahmad Chaddad
TL;DR
DA addresses domain shifts in medical image classification by aligning source and target distributions in a shared feature space to improve $p(Y|X)$ on the target when labels are scarce. The study simulates 10 deep CNNs with common DA techniques across four public datasets (SC HAM10000, BT MRI, CC Chest X-ray, MC multi-modality) and evaluates across noise, federated learning, interpretability, and calibration tasks. Key findings include about $4.7\%$ BT gains with ResNet34, about $3\%$ gains under Gaussian noise, and $2\%$–$3\%$ improvements on MC, plus improved interpretability (GradCAM++) and calibration (ECE reductions around 2–7%), while some cases show negative transfer or limited FL gains. These results suggest that DA is beneficial for MRI/CT brain-tumor and multi-modality tasks but requires dataset- and architecture-specific tuning, particularly for imbalanced data and federated settings.
Abstract
Domain adaptation (DA) is a quickly expanding area in machine learning that involves adjusting a model trained in one domain to perform well in another domain. While there have been notable progressions, the fundamental concept of numerous DA methodologies has persisted: aligning the data from various domains into a shared feature space. In this space, knowledge acquired from labeled source data can improve the model training on target data that lacks sufficient labels. In this study, we demonstrate the use of 10 deep learning models to simulate common DA techniques and explore their application in four medical image datasets. We have considered various situations such as multi-modality, noisy data, federated learning (FL), interpretability analysis, and classifier calibration. The experimental results indicate that using DA with ResNet34 in a brain tumor (BT) data set results in an enhancement of 4.7\% in model performance. Similarly, the use of DA can reduce the impact of Gaussian noise, as it provides $\sim 3\%$ accuracy increase using ResNet34 on a BT dataset. Furthermore, simply introducing DA into FL framework shows limited potential (e.g., $\sim 0.3\%$ increase in performance) for skin cancer classification. In addition, the DA method can improve the interpretability of the models using the gradcam++ technique, which offers clinical values. Calibration analysis also demonstrates that using DA provides a lower expected calibration error (ECE) value $\sim 2\%$ compared to CNN alone on a multi-modality dataset.
