Revisiting Data Scaling Law for Medical Segmentation
Yuetan Chu, Zhongyi Han, Gongning Luo, Xin Gao
TL;DR
The paper investigates how segmentation performance scales with training data in medical imaging across 15 tasks and 4 modalities, validating a power-law relationship using BCE loss with Res-UNet and Swin-UNet backbones. It leverages deformation-based augmentation rooted in topological principles to enhance data efficiency, comparing random elastic deformation (RED), registration-based augmentation (RegDA), and a generated deformation augmentation (GenDA). RegDA and GenDA accelerate convergence and reduce data requirements, with GenDA providing the strongest gains, even without added external data. The findings suggest topology-aware augmentations can break conventional scaling laws, enabling more efficient, lower-cost medical segmentation models, though limitations include 2D experiments and the need for validation in 3D and more diverse pathologies.
Abstract
The population loss of trained deep neural networks often exhibits power law scaling with the size of the training dataset, guiding significant performance advancements in deep learning applications. In this study, we focus on the scaling relationship with data size in the context of medical anatomical segmentation, a domain that remains underexplored. We analyze scaling laws for anatomical segmentation across 15 semantic tasks and 4 imaging modalities, demonstrating that larger datasets significantly improve segmentation performance, following similar scaling trends. Motivated by the topological isomorphism in images sharing anatomical structures, we evaluate the impact of deformation-guided augmentation strategies on data scaling laws, specifically random elastic deformation and registration-guided deformation. We also propose a novel, scalable image augmentation approach that generates diffeomorphic mappings from geodesic subspace based on image registration to introduce realistic deformation. Our experimental results demonstrate that both registered and generated deformation-based augmentation considerably enhance data utilization efficiency. The proposed generated deformation method notably achieves superior performance and accelerated convergence, surpassing standard power law scaling trends without requiring additional data. Overall, this work provides insights into the understanding of segmentation scalability and topological variation impact in medical imaging, thereby leading to more efficient model development with reduced annotation and computational costs.
