Table of Contents
Fetching ...

Principal Component Clustering for Semantic Segmentation in Synthetic Data Generation

Felix Stillger, Frederik Hasecke, Tobias Meisen

TL;DR

The paper tackles generating semantic segmentation data without labeling by exploiting latent diffusion models. It introduces a pipeline that preserves per-head self-attention features, reduces them with PCA, and clusters them to form masks, then uses open-vocabulary cross-attention to assign classes, followed by a refinement stage based on the diffusion output. The approach enables class-agnostic segmentation from latents and achieves competitive mean IoU on Pascal VOC when trained with the synthetic data, highlighting a path toward scalable, label-free synthetic datasets for segmentation. Its significance lies in leveraging self- and cross-attention within a diffusion framework to produce usable segmentation maps from unlabeled synthetic data, potentially reducing reliance on costly annotated datasets.

Abstract

This technical report outlines our method for generating a synthetic dataset for semantic segmentation using a latent diffusion model. Our approach eliminates the need for additional models specifically trained on segmentation data and is part of our submission to the CVPR 2024 workshop challenge, entitled CVPR 2024 workshop challenge "SyntaGen Harnessing Generative Models for Synthetic Visual Datasets". Our methodology uses self-attentions to facilitate a novel head-wise semantic information condensation, thereby enabling the direct acquisition of class-agnostic image segmentation from the Stable Diffusion latents. Furthermore, we employ non-prompt-influencing cross-attentions from text to pixel, thus facilitating the classification of the previously generated masks. Finally, we propose a mask refinement step by using only the output image by Stable Diffusion.

Principal Component Clustering for Semantic Segmentation in Synthetic Data Generation

TL;DR

The paper tackles generating semantic segmentation data without labeling by exploiting latent diffusion models. It introduces a pipeline that preserves per-head self-attention features, reduces them with PCA, and clusters them to form masks, then uses open-vocabulary cross-attention to assign classes, followed by a refinement stage based on the diffusion output. The approach enables class-agnostic segmentation from latents and achieves competitive mean IoU on Pascal VOC when trained with the synthetic data, highlighting a path toward scalable, label-free synthetic datasets for segmentation. Its significance lies in leveraging self- and cross-attention within a diffusion framework to produce usable segmentation maps from unlabeled synthetic data, potentially reducing reliance on costly annotated datasets.

Abstract

This technical report outlines our method for generating a synthetic dataset for semantic segmentation using a latent diffusion model. Our approach eliminates the need for additional models specifically trained on segmentation data and is part of our submission to the CVPR 2024 workshop challenge, entitled CVPR 2024 workshop challenge "SyntaGen Harnessing Generative Models for Synthetic Visual Datasets". Our methodology uses self-attentions to facilitate a novel head-wise semantic information condensation, thereby enabling the direct acquisition of class-agnostic image segmentation from the Stable Diffusion latents. Furthermore, we employ non-prompt-influencing cross-attentions from text to pixel, thus facilitating the classification of the previously generated masks. Finally, we propose a mask refinement step by using only the output image by Stable Diffusion.

Paper Structure

This paper contains 7 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Diagram of the Pipeline (Figure Adapted from OVAMmarcosmanchon2024openvocabulary)
  • Figure 2: Examples from our Submitted Dataset