Good Enough: Is it Worth Improving your Label Quality?
Alexander Jaus, Zdravko Marinov, Constantin Seibold, Simon Reiß, Jens Kleesiek, Rainer Stiefelhagen
TL;DR
This work investigates whether investing in higher-quality labels for medical CT segmentation is worthwhile. By generating seven pseudo-label datasets from diverse predictors (including nnU-Net, TotalSegmentator, MedSAM, and STU-Net variants) across five base CT datasets, the authors create an independent benchmark to study label quality effects on in-domain performance and pre-training transfer. They find that in-domain gains track label quality and can be substantial, but small improvements yield dataset-dependent or negligible benefits, while pre-training benefits are largely insensitive to label quality. The study concludes that label refinement should be prioritized for in-domain segmentation tasks where substantial improvements are achievable, whereas its value for pre-training transfer is limited.
Abstract
Improving label quality in medical image segmentation is costly, but its benefits remain unclear. We systematically evaluate its impact using multiple pseudo-labeled versions of CT datasets, generated by models like nnU-Net, TotalSegmentator, and MedSAM. Our results show that while higher-quality labels improve in-domain performance, gains remain unclear if below a small threshold. For pre-training, label quality has minimal impact, suggesting that models rather transfer general concepts than detailed annotations. These findings provide guidance on when improving label quality is worth the effort.
