Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation
Lingyan Ran, Yali Li, Tao Zhuo, Shizhou Zhang, Yanning Zhang
TL;DR
This work tackles semi-supervised semantic segmentation by challenging the assumption that only intensity-based augmentations are beneficial for weak-strong consistency. It introduces Adaptive Spatial Augmentation (ASAug), a pluggable module that applies strong spatial perturbations (rotation and translation) whose intensity per instance is governed by an entropy-based adaptive weight, computed from the weak predictions in a mean-teacher framework. A pixel-level consistency loss with spatial alignment (MSE across aligned predictions) complements the approach, enabling robust learning despite mask shifts. Across VOC 2012, Cityscapes, and COCO, ASAug delivers state-of-the-art improvements, demonstrating the value of incorporating spatial augmentations into SSSS and providing thorough ablations and qualitative analyses to validate its effectiveness.
Abstract
In semi-supervised semantic segmentation (SSSS), data augmentation plays a crucial role in the weak-to-strong consistency regularization framework, as it enhances diversity and improves model generalization. Recent strong augmentation methods have primarily focused on intensity-based perturbations, which have minimal impact on the semantic masks. In contrast, spatial augmentations like translation and rotation have long been acknowledged for their effectiveness in supervised semantic segmentation tasks, but they are often ignored in SSSS. In this work, we demonstrate that spatial augmentation can also contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations. Furthermore, recognizing the variability among images, we propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy. Extensive experiments show that our proposed Adaptive Spatial Augmentation (\textbf{ASAug}) can be integrated as a pluggable module, consistently improving the performance of existing methods and achieving state-of-the-art results on benchmark datasets such as PASCAL VOC 2012, Cityscapes, and COCO.
