Improving the generalization of deep learning models in the segmentation of mammography images
Jan Hurtado, Joao P. Maia, Cesar A. Sierra-Franco, Alberto Raposo
TL;DR
The paper tackles the challenge of segmenting landmark structures in mammography across images from different vendors. It proposes data-centric augmentation strategies—annotation-guided image intensity manipulation and style transfer—to enrich training data and improve generalization without additional manual labeling. Through extensive experiments on GE training data and evaluation on IMS, PLANMED, and HOLOGIC datasets (including CC and DDSM cases), the methods yield improved robustness and reduced prediction uncertainty, with the combination strategy offering the most consistent performance. The findings suggest practical potential for clinical deployment by enhancing cross-vendor segmentation accuracy while lowering labeling costs.
Abstract
Mammography stands as the main screening method for detecting breast cancer early, enhancing treatment success rates. The segmentation of landmark structures in mammography images can aid the medical assessment in the evaluation of cancer risk and the image acquisition adequacy. We introduce a series of data-centric strategies aimed at enriching the training data for deep learning-based segmentation of landmark structures. Our approach involves augmenting the training samples through annotation-guided image intensity manipulation and style transfer to achieve better generalization than standard training procedures. These augmentations are applied in a balanced manner to ensure the model learns to process a diverse range of images generated by different vendor equipments while retaining its efficacy on the original data. We present extensive numerical and visual results that demonstrate the superior generalization capabilities of our methods when compared to the standard training. For this evaluation, we consider a large dataset that includes mammography images generated by different vendor equipments. Further, we present complementary results that show both the strengths and limitations of our methods across various scenarios. The accuracy and robustness demonstrated in the experiments suggest that our method is well-suited for integration into clinical practice.
