Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook
Yuan Ma, Junlin Hou, Chao Zhang, Yukun Zhou, Zongyuan Ge, Haoran Xie, Lie Ju
TL;DR
LNMBench provides a unified, public framework to benchmark learning-with-noisy-label methods in medical imaging across synthetic and real-world noise, revealing that most existing approaches struggle under high or real-world noise and class imbalance. The study introduces MedSSL as a practical improvement for semi-supervised LNL, and delivers extensive analyses of dataset balance, sample selection, transition-matrix estimation, and real-world noise behavior. Key contributions include a standardized evaluation protocol, a comprehensive cross-dataset comparison of 10 methods, and actionable insights regarding robustness gaps and future research directions in medicine. The work emphasizes the need for real-world, per-sample noise models and purification-based strategies to enable robust deployment in clinical settings.
Abstract
Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses \textbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.
