Table of Contents
Fetching ...

Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation

Joshua Niemeijer, Jan Ehrhardt, Heinz Handels, Hristina Uzunova

TL;DR

This paper presents UnIACorN, a dual-conditioning diffusion framework that integrates semantic-label control with uncertainty-based control to generate labeled data tailored to a downstream segmentation task. By training a Semantic-ControlNet on labeled data and an Uncertainty-ControlNet on unlabeled data, and fusing their guidance during diffusion, it can produce diverse, high-uncertainty labeled samples that bridge domain gaps such as Spectralis versus HOME-OCT. Empirical results on retinal OCT show improved segmentation performance over style-transfer baselines and favorable domain-adaptation metrics, with a street-scene experiment illustrating broader applicability to arbitrary domain shifts. The approach enables efficient use of unlabeled data for task-relevant data augmentation without explicit domain-style learning, and suggests avenues for iterative, multi-sample generation and continual uncertainty refinement in practical deployment.

Abstract

Generative Models are a valuable tool for the controlled creation of high-quality image data. Controlled diffusion models like the ControlNet have allowed the creation of labeled distributions. Such synthetic datasets can augment the original training distribution when discriminative models, like semantic segmentation, are trained. However, this augmentation effect is limited since ControlNets tend to reproduce the original training distribution. This work introduces a method to utilize data from unlabeled domains to train ControlNets by introducing the concept of uncertainty into the control mechanism. The uncertainty indicates that a given image was not part of the training distribution of a downstream task, e.g., segmentation. Thus, two types of control are engaged in the final network: an uncertainty control from an unlabeled dataset and a semantic control from the labeled dataset. The resulting ControlNet allows us to create annotated data with high uncertainty from the target domain, i.e., synthetic data from the unlabeled distribution with labels. In our scenario, we consider retinal OCTs, where typically high-quality Spectralis images are available with given ground truth segmentations, enabling the training of segmentation networks. The recent development in Home-OCT devices, however, yields retinal OCTs with lower quality and a large domain shift, such that out-of-the-pocket segmentation networks cannot be applied for this type of data. Synthesizing annotated images from the Home-OCT domain using the proposed approach closes this gap and leads to significantly improved segmentation results without adding any further supervision. The advantage of uncertainty-guidance becomes obvious when compared to style transfer: it enables arbitrary domain shifts without any strict learning of an image style. This is also demonstrated in a traffic scene experiment.

Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation

TL;DR

This paper presents UnIACorN, a dual-conditioning diffusion framework that integrates semantic-label control with uncertainty-based control to generate labeled data tailored to a downstream segmentation task. By training a Semantic-ControlNet on labeled data and an Uncertainty-ControlNet on unlabeled data, and fusing their guidance during diffusion, it can produce diverse, high-uncertainty labeled samples that bridge domain gaps such as Spectralis versus HOME-OCT. Empirical results on retinal OCT show improved segmentation performance over style-transfer baselines and favorable domain-adaptation metrics, with a street-scene experiment illustrating broader applicability to arbitrary domain shifts. The approach enables efficient use of unlabeled data for task-relevant data augmentation without explicit domain-style learning, and suggests avenues for iterative, multi-sample generation and continual uncertainty refinement in practical deployment.

Abstract

Generative Models are a valuable tool for the controlled creation of high-quality image data. Controlled diffusion models like the ControlNet have allowed the creation of labeled distributions. Such synthetic datasets can augment the original training distribution when discriminative models, like semantic segmentation, are trained. However, this augmentation effect is limited since ControlNets tend to reproduce the original training distribution. This work introduces a method to utilize data from unlabeled domains to train ControlNets by introducing the concept of uncertainty into the control mechanism. The uncertainty indicates that a given image was not part of the training distribution of a downstream task, e.g., segmentation. Thus, two types of control are engaged in the final network: an uncertainty control from an unlabeled dataset and a semantic control from the labeled dataset. The resulting ControlNet allows us to create annotated data with high uncertainty from the target domain, i.e., synthetic data from the unlabeled distribution with labels. In our scenario, we consider retinal OCTs, where typically high-quality Spectralis images are available with given ground truth segmentations, enabling the training of segmentation networks. The recent development in Home-OCT devices, however, yields retinal OCTs with lower quality and a large domain shift, such that out-of-the-pocket segmentation networks cannot be applied for this type of data. Synthesizing annotated images from the Home-OCT domain using the proposed approach closes this gap and leads to significantly improved segmentation results without adding any further supervision. The advantage of uncertainty-guidance becomes obvious when compared to style transfer: it enables arbitrary domain shifts without any strict learning of an image style. This is also demonstrated in a traffic scene experiment.

Paper Structure

This paper contains 13 sections, 7 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: We aim at learning the uncertainties of a recorded but unlabeled distribution and generating new labeled data containing high uncertainties.
  • Figure 2: During inference, we infer both the Semantic-ControlNet and the Uncertainty-ControlNet in each diffusion step. Finally, a weighted sum of both is computed, allowing for control over the segmentation mask and the uncertainty in the image.
  • Figure 3: The Semantic-ControlNet is trained on the labeled Spectralis Domain. The Uncertainty-ControlNet is trained on both the Spectralis and Home-OCT domains.
  • Figure 4: The images generated by the Semantic-ControlNet match the distribution of the labeled images.
  • Figure 5: The normal distributions of the average image uncertainties of the segmentation network applied to the Spectralis and Home OCT data. We sample from these distributions and feed the resulting uncertainty to the Uncertainty-ControlNet. The response to different uncertainties can be seen in the bottom row of images.
  • ...and 4 more figures