Table of Contents
Fetching ...

Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets

Hannah Kniesel, Pedro Hermosilla, Timo Ropinski

TL;DR

This work explores uncertainty, query by committee, and expected model change, and demonstrate their application for guiding the sample generation process through gradient approximation, and shows that segmentation models trained with guided synthetic data outperform those trained on non-guided synthetic data.

Abstract

Recent advances in conditional image generation from diffusion models have shown great potential in achieving impressive image quality while preserving the constraints introduced by the user. In particular, ControlNet enables precise alignment between ground truth segmentation masks and the generated image content, allowing the enhancement of training datasets in segmentation tasks. This raises a key question: Can ControlNet additionally be guided to generate the most informative synthetic samples for a specific task? Inspired by active learning, where the most informative real-world samples are selected based on sample difficulty or model uncertainty, we propose the first approach to integrate active learning-based selection metrics into the backward diffusion process for sample generation. Specifically, we explore uncertainty, query by committee, and expected model change, which are commonly used in active learning, and demonstrate their application for guiding the sample generation process through gradient approximation. Our method is training-free, modifying only the backward diffusion process, allowing it to be used on any pretrained ControlNet. Using this process, we show that segmentation models trained with guided synthetic data outperform those trained on non-guided synthetic data. Our work underscores the need for advanced control mechanisms for diffusion-based models, which are not only aligned with image content but additionally downstream task performance, highlighting the true potential of synthetic data generation.

Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets

TL;DR

This work explores uncertainty, query by committee, and expected model change, and demonstrate their application for guiding the sample generation process through gradient approximation, and shows that segmentation models trained with guided synthetic data outperform those trained on non-guided synthetic data.

Abstract

Recent advances in conditional image generation from diffusion models have shown great potential in achieving impressive image quality while preserving the constraints introduced by the user. In particular, ControlNet enables precise alignment between ground truth segmentation masks and the generated image content, allowing the enhancement of training datasets in segmentation tasks. This raises a key question: Can ControlNet additionally be guided to generate the most informative synthetic samples for a specific task? Inspired by active learning, where the most informative real-world samples are selected based on sample difficulty or model uncertainty, we propose the first approach to integrate active learning-based selection metrics into the backward diffusion process for sample generation. Specifically, we explore uncertainty, query by committee, and expected model change, which are commonly used in active learning, and demonstrate their application for guiding the sample generation process through gradient approximation. Our method is training-free, modifying only the backward diffusion process, allowing it to be used on any pretrained ControlNet. Using this process, we show that segmentation models trained with guided synthetic data outperform those trained on non-guided synthetic data. Our work underscores the need for advanced control mechanisms for diffusion-based models, which are not only aligned with image content but additionally downstream task performance, highlighting the true potential of synthetic data generation.

Paper Structure

This paper contains 34 sections, 6 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our proposed iterative data generation and model refinement pipeline introducing active learning inspired ControlNet guidance: A segmentation model trained on real data guides ControlNet to generate informative, real-data-aligned samples, which are then added to the training set for model retraining. This active learning-inspired process refines the model through data generation.
  • Figure 2: Visualization of latent guidance with single step denoising. For each denoising step, we approximate the clean image $\hat{x_0}$ by single step denoising, such that we are able to apply the loss function to update the current latent $x_t$.
  • Figure 3: Qualitative comparison of different loss metrics during guidance. We visualize the real images (top row) as well as synthetically augmented images following kupyn2024dataset (second row) next to our proposed guidance in the following rows. Red borders outline the synthetically augmented object. The images share high visual quality.
  • Figure 4: The plots visualize the uncertainty of generated objects when applying guided backward diffusion using different metrics at varying guidance strengths. Uncertainty is measured according to the respective metric. As guidance strength increases, we expect uncertainty to rise accordingly. However, MCD introduces noise, making it unreliable for gradient-based optimization. This instability arises from its stochastic nature, which injects variance into the optimization process, ultimately reducing its effectiveness.
  • Figure 5: Visualization of failure cases of CE guidance. Guiding based on CE leads to an ill-posed optimization problem. It can encourage the prediction of images which are being misclassified by the model, but it might as well predict samples containing incorrect classes.
  • ...and 4 more figures