Table of Contents
Fetching ...

R&D: Balancing Reliability and Diversity in Synthetic Data Augmentation for Semantic Segmentation

Huy Che, Dinh-Duy Phan, Duc-Khai Lam

Abstract

Collecting and annotating datasets for pixel-level semantic segmentation tasks are highly labor-intensive. Data augmentation provides a viable solution by enhancing model generalization without additional real-world data collection. Traditional augmentation techniques, such as translation, scaling, and color transformations, create geometric variations but fail to generate new structures. While generative models have been employed to extend semantic information of datasets, they often struggle to maintain consistency between the original and generated images, particularly for pixel-level tasks. In this work, we propose a novel synthetic data augmentation pipeline that integrates controllable diffusion models. Our approach balances diversity and reliability data, effectively bridging the gap between synthetic and real data. We utilize class-aware prompting and visual prior blending to improve image quality further, ensuring precise alignment with segmentation labels. By evaluating benchmark datasets such as PASCAL VOC and BDD100K, we demonstrate that our method significantly enhances semantic segmentation performance, especially in data-scarce scenarios, while improving model robustness in real-world applications. Our code is available at \href{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}.

R&D: Balancing Reliability and Diversity in Synthetic Data Augmentation for Semantic Segmentation

Abstract

Collecting and annotating datasets for pixel-level semantic segmentation tasks are highly labor-intensive. Data augmentation provides a viable solution by enhancing model generalization without additional real-world data collection. Traditional augmentation techniques, such as translation, scaling, and color transformations, create geometric variations but fail to generate new structures. While generative models have been employed to extend semantic information of datasets, they often struggle to maintain consistency between the original and generated images, particularly for pixel-level tasks. In this work, we propose a novel synthetic data augmentation pipeline that integrates controllable diffusion models. Our approach balances diversity and reliability data, effectively bridging the gap between synthetic and real data. We utilize class-aware prompting and visual prior blending to improve image quality further, ensuring precise alignment with segmentation labels. By evaluating benchmark datasets such as PASCAL VOC and BDD100K, we demonstrate that our method significantly enhances semantic segmentation performance, especially in data-scarce scenarios, while improving model robustness in real-world applications. Our code is available at \href{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}.
Paper Structure (25 sections, 4 equations, 6 figures, 7 tables)

This paper contains 25 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Our proposed synthetic data augmentation pipeline utilizes the real dataset $\mathcal{D}_0$ to create two synthetic datasets, $\mathcal{D}_1^{gen}$ and $\mathcal{D}_2^{gen}$. The annotations for the synthetic data are directly copied from the labels of the real dataset.
  • Figure 2: Some examples of text prompt selection for input images show that simple text prompts are often too simplistic, while generated captions may miss some labeled classes. Class-prompt appending addresses this but can lead to incoherent prompts. In contrast, conditional image captioning creates coherent prompts that accurately describe the image and include all labeled classes.
  • Figure 3: Image generation using the Img2Img Controllable Diffusion Model.
  • Figure 4: Image generation using the Controllable Inpainting Diffusion Model.
  • Figure 4: Quantitative comparison (FID /CLIP Score (ViT-B/32)).
  • ...and 1 more figures