Table of Contents
Fetching ...

HSMix: Hard and Soft Mixing Data Augmentation for Medical Image Segmentation

Danyang Sun, Fadi Dornaika, Nagore Barrena

TL;DR

Medical image segmentation suffers from data scarcity and overfitting. HSMix introduces a dual augmentation strategy that merges hard, contour-preserving superpixel-based CutMix with soft, saliency-guided superpixel Mixup, applied to both images and masks during training. The approach is model-agnostic and validated across multiple architectures and medical datasets, yielding consistent gains in DSC and JAC with manageable training overhead. By leveraging local structure and saliency, HSMix expands the augmentation space while preserving critical boundaries, offering practical benefits for diverse medical imaging tasks.

Abstract

Due to the high cost of annotation or the rarity of some diseases, medical image segmentation is often limited by data scarcity and the resulting overfitting problem. Self-supervised learning and semi-supervised learning can mitigate the data scarcity challenge to some extent. However, both of these paradigms are complex and require either hand-crafted pretexts or well-defined pseudo-labels. In contrast, data augmentation represents a relatively simple and straightforward approach to addressing data scarcity issues. It has led to significant improvements in image recognition tasks. However, the effectiveness of local image editing augmentation techniques in the context of segmentation has been less explored. We propose HSMix, a novel approach to local image editing data augmentation involving hard and soft mixing for medical semantic segmentation. In our approach, a hard-augmented image is created by combining homogeneous regions (superpixels) from two source images. A soft mixing method further adjusts the brightness of these composed regions with brightness mixing based on locally aggregated pixel-wise saliency coefficients. The ground-truth segmentation masks of the two source images undergo the same mixing operations to generate the associated masks for the augmented images. Our method fully exploits both the prior contour and saliency information, thus preserving local semantic information in the augmented images while enriching the augmentation space with more diversity. Our method is a plug-and-play solution that is model agnostic and applicable to a range of medical imaging modalities. Extensive experimental evidence has demonstrated its effectiveness in a variety of medical segmentation tasks. The source code is available in https://github.com/DanielaPlusPlus/HSMix.

HSMix: Hard and Soft Mixing Data Augmentation for Medical Image Segmentation

TL;DR

Medical image segmentation suffers from data scarcity and overfitting. HSMix introduces a dual augmentation strategy that merges hard, contour-preserving superpixel-based CutMix with soft, saliency-guided superpixel Mixup, applied to both images and masks during training. The approach is model-agnostic and validated across multiple architectures and medical datasets, yielding consistent gains in DSC and JAC with manageable training overhead. By leveraging local structure and saliency, HSMix expands the augmentation space while preserving critical boundaries, offering practical benefits for diverse medical imaging tasks.

Abstract

Due to the high cost of annotation or the rarity of some diseases, medical image segmentation is often limited by data scarcity and the resulting overfitting problem. Self-supervised learning and semi-supervised learning can mitigate the data scarcity challenge to some extent. However, both of these paradigms are complex and require either hand-crafted pretexts or well-defined pseudo-labels. In contrast, data augmentation represents a relatively simple and straightforward approach to addressing data scarcity issues. It has led to significant improvements in image recognition tasks. However, the effectiveness of local image editing augmentation techniques in the context of segmentation has been less explored. We propose HSMix, a novel approach to local image editing data augmentation involving hard and soft mixing for medical semantic segmentation. In our approach, a hard-augmented image is created by combining homogeneous regions (superpixels) from two source images. A soft mixing method further adjusts the brightness of these composed regions with brightness mixing based on locally aggregated pixel-wise saliency coefficients. The ground-truth segmentation masks of the two source images undergo the same mixing operations to generate the associated masks for the augmented images. Our method fully exploits both the prior contour and saliency information, thus preserving local semantic information in the augmented images while enriching the augmentation space with more diversity. Our method is a plug-and-play solution that is model agnostic and applicable to a range of medical imaging modalities. Extensive experimental evidence has demonstrated its effectiveness in a variety of medical segmentation tasks. The source code is available in https://github.com/DanielaPlusPlus/HSMix.

Paper Structure

This paper contains 19 sections, 8 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualization of the augmentation (augmented images and corresponding GT. masks) from representative methods. Our method in (c) can preserve more boundaries and saliency information in a more diverse space. $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$ denote two original images, and $\mathbf{Y}_{1}$ and $\mathbf{Y}_{2}$ denote the corresponding ground truth masks.
  • Figure 2: The process of hard mixing. $\mathbf{X}_{1}$, $\mathbf{X}_{2}$ and $\mathbf{Y}_{1}$, $\mathbf{Y}_{2}$ denote the two original images and their corresponding ground truth masks. $\mathbf{X}_{h}$ is the augmented image, $\mathbf{Y}_{h}$ is the ground truth mask of the augmented image. $\mathbf{M}_{h}$ is the binary mask for hard mixing.
  • Figure 3: The process of soft mixing. $\mathbf{X}_{1}$, $\mathbf{X}_{2}$ and $\mathbf{Y}_{1}$, $\mathbf{Y}_{2}$ denote the two original images and their corresponding ground truth masks. $\mathbf{X}_{s}$ is the augmented image, $\mathbf{Y}_{s}$ is the ground truth mask of the augmented image. $\mathbf{M}_{s}$ is the mask for soft mixing.
  • Figure 4: (a)(b)(f)(g) Training images; (c)(d)(e)(h)(i)(j) Generated augmented images with outlined superpixel grids in green color. The numbers of superpixels, $l_{1}$ and $l_{2}$ corresponding to either training image, are fixed here for visualization.
  • Figure 5: Performance changing with $p$, the selection probability of superpixels, on ISIC2017 Task 1 and Glas dataset using the UNet model. The best is achieved when $p=0.3$.
  • ...and 2 more figures