Active Label Cleaning for Reliable Detection of Electron Dense Deposits in Transmission Electron Microscopy Images
Jieyun Tan, Shuo Liu, Guibin Zhang, Ziqi Li, Jian Geng, Lei Zhang, Lei Cao
TL;DR
This study tackles the challenge of reliable electron-dense deposit detection in TEM images under scarce, noisy crowdsourced labels. It introduces a two-step active label cleaning framework powered by a Label Selection Module that (i) selects the most informative samples for expert re-annotation and (ii) grades instance-level noise to guide automatic correction and expert review. The approach combines dual models (a cleaning model and a consensus model) with an active learning loop and a Bib Noise Correction Module, achieving $AP_{50}=67.18\%$ on private data (an $18.83\%$ improvement over noisy training and $95.79\%$ of Clean Training) and $AP_{50}=88.63\%$ on a public dataset, with substantial annotation-cost reductions. These results demonstrate that targeted, cost-efficient label cleaning can enable robust medical AI in contexts with limited expert resources, and the method generalizes well across datasets.
Abstract
Automated detection of electron dense deposits (EDD) in glomerular disease is hindered by the scarcity of high-quality labeled data. While crowdsourcing reduces annotation cost, it introduces label noise. We propose an active label cleaning method to efficiently denoise crowdsourced datasets. Our approach uses active learning to select the most valuable noisy samples for expert re-annotation, building high-accuracy cleaning models. A Label Selection Module leverages discrepancies between crowdsourced labels and model predictions for both sample selection and instance-level noise grading. Experiments show our method achieves 67.18% AP\textsubscript{50} on a private dataset, an 18.83% improvement over training on noisy labels. This performance reaches 95.79% of that with full expert annotation while reducing annotation cost by 73.30%. The method provides a practical, cost-effective solution for developing reliable medical AI with limited expert resources.
