Table of Contents
Fetching ...

Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation

Lingyan Ran, Yali Li, Tao Zhuo, Shizhou Zhang, Yanning Zhang

TL;DR

This work tackles semi-supervised semantic segmentation by challenging the assumption that only intensity-based augmentations are beneficial for weak-strong consistency. It introduces Adaptive Spatial Augmentation (ASAug), a pluggable module that applies strong spatial perturbations (rotation and translation) whose intensity per instance is governed by an entropy-based adaptive weight, computed from the weak predictions in a mean-teacher framework. A pixel-level consistency loss with spatial alignment (MSE across aligned predictions) complements the approach, enabling robust learning despite mask shifts. Across VOC 2012, Cityscapes, and COCO, ASAug delivers state-of-the-art improvements, demonstrating the value of incorporating spatial augmentations into SSSS and providing thorough ablations and qualitative analyses to validate its effectiveness.

Abstract

In semi-supervised semantic segmentation (SSSS), data augmentation plays a crucial role in the weak-to-strong consistency regularization framework, as it enhances diversity and improves model generalization. Recent strong augmentation methods have primarily focused on intensity-based perturbations, which have minimal impact on the semantic masks. In contrast, spatial augmentations like translation and rotation have long been acknowledged for their effectiveness in supervised semantic segmentation tasks, but they are often ignored in SSSS. In this work, we demonstrate that spatial augmentation can also contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations. Furthermore, recognizing the variability among images, we propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy. Extensive experiments show that our proposed Adaptive Spatial Augmentation (\textbf{ASAug}) can be integrated as a pluggable module, consistently improving the performance of existing methods and achieving state-of-the-art results on benchmark datasets such as PASCAL VOC 2012, Cityscapes, and COCO.

Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation

TL;DR

This work tackles semi-supervised semantic segmentation by challenging the assumption that only intensity-based augmentations are beneficial for weak-strong consistency. It introduces Adaptive Spatial Augmentation (ASAug), a pluggable module that applies strong spatial perturbations (rotation and translation) whose intensity per instance is governed by an entropy-based adaptive weight, computed from the weak predictions in a mean-teacher framework. A pixel-level consistency loss with spatial alignment (MSE across aligned predictions) complements the approach, enabling robust learning despite mask shifts. Across VOC 2012, Cityscapes, and COCO, ASAug delivers state-of-the-art improvements, demonstrating the value of incorporating spatial augmentations into SSSS and providing thorough ablations and qualitative analyses to validate its effectiveness.

Abstract

In semi-supervised semantic segmentation (SSSS), data augmentation plays a crucial role in the weak-to-strong consistency regularization framework, as it enhances diversity and improves model generalization. Recent strong augmentation methods have primarily focused on intensity-based perturbations, which have minimal impact on the semantic masks. In contrast, spatial augmentations like translation and rotation have long been acknowledged for their effectiveness in supervised semantic segmentation tasks, but they are often ignored in SSSS. In this work, we demonstrate that spatial augmentation can also contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations. Furthermore, recognizing the variability among images, we propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy. Extensive experiments show that our proposed Adaptive Spatial Augmentation (\textbf{ASAug}) can be integrated as a pluggable module, consistently improving the performance of existing methods and achieving state-of-the-art results on benchmark datasets such as PASCAL VOC 2012, Cityscapes, and COCO.

Paper Structure

This paper contains 15 sections, 7 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison with SOTA methods on the Pascal VOC 2012 dataset. Notably, our method outperforms other approaches in all partitioning scenarios.
  • Figure 2: Comparisons between the intensity and spatial augmentations. (a) intensity-based augmentations, like contrast, color, blur, brightness, and contrast adjustments, modify pixel appearance without changing spatial positions of the original mask; (b) spatial augmentations, such as rotation and translation, directly change the positions of pixels, leading to inconsistent masks between the original image and the augmented image.
  • Figure 3: Illustration of our ASAug pipeline. Based on the teacher-student consistency training framework ouali2020overview, we introduce an adaptive method that can selectively distort images as geometrically strong enhancements based on their reliability and importance. "EAW" denotes our entropy-based adaptive weight. Notably, geometric adjustment changes pixel point locations, and we apply the same operation to the teacher's output to ensure the consistency of predictions before and after enhancement.
  • Figure 4: Compare EAW with direct spatial augmentations. (a) EAW vs. fixed rotation angle, (b) EAW vs. same translation ratio (Based on Allsparkwang2024allspark).
  • Figure 5: Ablation study on EAW hyper-parameters $k_r$ and $k_t$ trained using the partitions of 1464, $d_t=d_r=1.0$ (Based on Allsparkwang2024allspark).
  • ...and 3 more figures