Table of Contents
Fetching ...

AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images

Prithvijit Chattopadhyay, Bharat Goyal, Boglarka Ecsedi, Viraj Prabhu, Judy Hoffman

TL;DR

AUGCAL addresses the persistent miscalibration and overconfidence seen in Sim2Real adaptations by introducing a training-time patch that combines strong synthetic augmentations with a calibration-focused loss on augmented predictions. By augmenting source (synthetic) images with Aug transformations (e.g., Pasta, RandAugment) and optimizing a calibration loss (e.g., DCA) alongside standard UDA objectives, AUGCAL tightens an upper bound on target calibration error that comprises both domain divergence and source calibration terms. Empirically, AUGCAL improves calibration metrics (ECE, IC-ECE, OC) and reliability (PRR) across semantic segmentation and object recognition benchmarks, while preserving or enhancing transfer performance across multiple base methods (Entropy Minimization, HRDA, SDAT) and backbones (CNNs and Transformers). The approach is lightweight, task-agnostic, and demonstrates practical impact for deploying more reliable Sim2Real models in real-world settings.

Abstract

Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be challenging due to appearance disparities. A commonly employed solution to counter this SIM2REAL gap is unsupervised domain adaptation, where models are trained using labeled SIM data and unlabeled REAL data. Mispredictions made by such SIM2REAL adapted models are often associated with miscalibration - stemming from overconfident predictions on real data. In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves SIM2REAL adapted models by - (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection - all while retaining or improving SIM2REAL performance. Given a base SIM2REAL adaptation algorithm, at training time, AUGCAL involves replacing vanilla SIM images with strongly augmented views (AUG intervention) and additionally optimizing for a training time calibration loss on augmented SIM predictions (CAL intervention). We motivate AUGCAL using a brief analytical justification of how to reduce miscalibration on unlabeled REAL data. Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.

AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images

TL;DR

AUGCAL addresses the persistent miscalibration and overconfidence seen in Sim2Real adaptations by introducing a training-time patch that combines strong synthetic augmentations with a calibration-focused loss on augmented predictions. By augmenting source (synthetic) images with Aug transformations (e.g., Pasta, RandAugment) and optimizing a calibration loss (e.g., DCA) alongside standard UDA objectives, AUGCAL tightens an upper bound on target calibration error that comprises both domain divergence and source calibration terms. Empirically, AUGCAL improves calibration metrics (ECE, IC-ECE, OC) and reliability (PRR) across semantic segmentation and object recognition benchmarks, while preserving or enhancing transfer performance across multiple base methods (Entropy Minimization, HRDA, SDAT) and backbones (CNNs and Transformers). The approach is lightweight, task-agnostic, and demonstrates practical impact for deploying more reliable Sim2Real models in real-world settings.

Abstract

Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be challenging due to appearance disparities. A commonly employed solution to counter this SIM2REAL gap is unsupervised domain adaptation, where models are trained using labeled SIM data and unlabeled REAL data. Mispredictions made by such SIM2REAL adapted models are often associated with miscalibration - stemming from overconfident predictions on real data. In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves SIM2REAL adapted models by - (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection - all while retaining or improving SIM2REAL performance. Given a base SIM2REAL adaptation algorithm, at training time, AUGCAL involves replacing vanilla SIM images with strongly augmented views (AUG intervention) and additionally optimizing for a training time calibration loss on augmented SIM predictions (CAL intervention). We motivate AUGCAL using a brief analytical justification of how to reduce miscalibration on unlabeled REAL data. Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.
Paper Structure (25 sections, 16 equations, 5 figures, 8 tables)

This paper contains 25 sections, 16 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overconfident Sim2Real mispredictions. [Left] We show an example of what we mean by overconfident mispredictions. For Sim2Real adaptation on GTAV$\to$Cityscapes, we choose (DAFormer) HRDA + MIC hoyer2022mic and EntMin + MIC vu2019advent (highly performant Sim2Real methods) and show erroneous predictions on Cityscapes (bottom row). We can see that the model identifies sidewalk pixels as road (2nd column) and fence pixels as wall (3rd column) with very high confidence. [Right] We show how pervasive this "overconfidence" phenomena is. While better Sim2Real adapted models -- from (DAFormer) Source-Only hoyer2022hrda to (DAFormer) EntMin + MIC vu2019advent to (DAFormer) HRDA + MIC hoyer2022mic -- exhibit improved transfer performance [Top, Right], they also exhibit increased overconfidence in mispredictions [Bottom, Right], affecting prediction reliability.
  • Figure 2: AugCal pipeline. AugCal consists of two key interventions on an existing Sim2Real adaptation method. First source Sim images are augmented via an Aug transform. Supervised losses for Sim images are computed on the augmented image predictions. Additionally, AugCal optimizes for a calibration loss on Augmented Sim predictions.
  • Figure 3: AugCal increases the proportion of "accurate" and "certain" predictions. For a (DAFormer) HRDA + MIC (row 1) and EntMin + MIC (row 2) on GTAV$\to$Cityscapes, we show how different interventions affect the proportion of "accurate" and "certain" (confidence $> 0.95$) predictions (indicated in gray per column). Regions in black do not satisfy the "accurate" and "certain" filtering criteria. We see that compared to a base adaptation method, AugCal increases the proportion highly-confident correct predictions (green boxes). Aug and Cal applied alone can potentially reduce that proportion (yellow boxes). Aug is Pasta, Cal is DCA.
  • Figure 4: AugCal increases the proportion of "accurate" and "certain" predictions. For a base DAFormer SemSeg model trained with HRDA + MIC (State-of-the-art) on GTAV$\to$Cityscapes, we show how applying AugCal at training time (right) can improve the proportion of "accurate" and "certain" (confidence $> 0.95$) predictions over the vanilla adaptation method (left). Regions in black do not satisfy the "accurate" and "certain" filtering criteria.
  • Figure 5: AugCal increases the proportion of "accurate" and "certain" predictions. For a base DAFormer SemSeg model trained with EntMin + MIC on GTAV$\to$Cityscapes, we show how applying AugCal at training time (right) can improve the proportion of "accurate" and "certain" (confidence $> 0.95$) predictions over the vanilla adaptation method (left). Regions in black do not satisfy the "accurate" and "certain" filtering criteria.