Table of Contents
Fetching ...

Does Unsupervised Domain Adaptation Improve the Robustness of Amortized Bayesian Inference? A Systematic Evaluation

Lasse Elsemüller, Valentin Pratz, Mischa von Krause, Andreas Voss, Paul-Christian Bürkner, Stefan T. Radev

TL;DR

This work systematically assesses unsupervised domain adaptation (UDA) for amortized Bayesian inference (ABI), introducing two NPE-UDA variants (NPE-MMD and NPE-DANN) that align simulated and observed summary statistics. By formulating a trade-off objective that adds a domain-alignment term with weight $\lambda$, the study reveals that UDA can reduce extrapolation bias under likelihood misspecification but often harms performance under prior misspecification, with sensitivity to dimensionality and training dynamics. Across four experiments—Ricker, a 2D Gaussian benchmark, high-dimensional Bayesian denoising, and a large real-world diffusion-model task on IAT data—the results show nuanced, dataset-dependent outcomes: domain alignment improves some metrics like calibration and hidden-domain matching, yet can cause instability or information loss when misalignment stems from prior misspecification. The findings emphasize careful consideration of misspecification types and hyperparameter tuning, and highlight the need for robust, interpretable evaluation protocols when applying NPE-UDA in real-world, high-stakes contexts.

Abstract

Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in silico and practice. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA to increase the robustness of ABI.

Does Unsupervised Domain Adaptation Improve the Robustness of Amortized Bayesian Inference? A Systematic Evaluation

TL;DR

This work systematically assesses unsupervised domain adaptation (UDA) for amortized Bayesian inference (ABI), introducing two NPE-UDA variants (NPE-MMD and NPE-DANN) that align simulated and observed summary statistics. By formulating a trade-off objective that adds a domain-alignment term with weight , the study reveals that UDA can reduce extrapolation bias under likelihood misspecification but often harms performance under prior misspecification, with sensitivity to dimensionality and training dynamics. Across four experiments—Ricker, a 2D Gaussian benchmark, high-dimensional Bayesian denoising, and a large real-world diffusion-model task on IAT data—the results show nuanced, dataset-dependent outcomes: domain alignment improves some metrics like calibration and hidden-domain matching, yet can cause instability or information loss when misalignment stems from prior misspecification. The findings emphasize careful consideration of misspecification types and hyperparameter tuning, and highlight the need for robust, interpretable evaluation protocols when applying NPE-UDA in real-world, high-stakes contexts.

Abstract

Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in silico and practice. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA to increase the robustness of ABI.

Paper Structure

This paper contains 65 sections, 13 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: Schematic overview of NPE-UDA methods that combine neural posterior estimation (NPE) with unsupervised domain adaptation (UDA). Standard NPE training optimizes posterior approximation in a simulation-based training loop. NPE-UDA approaches introduce observed data into the training procedure, targeting performance improvements in the (possibly shifted) observed domain via domain alignment in summary space. NPE-MMD (maximum mean discrepancy) directly minimizes the distance between distributions, whereas NPE-DANN (domain-adversarial neural networks) uses adversarial competition between an auxiliary domain classifier and the summary network.
  • Figure 2: Experiment 3: Summary space domain distance (SSDD; MMD) vs. normalized root mean squared error (NRMSE) for row deletions. We observe a sweet spot of domain alignment without losing important information.
  • Figure 3: Experiment 1. Parameter space performance metrics resulting from $50$ separate Bayesian hyperparameter optimization runs per method. The solid trend lines represent the predictive mean of a Gaussian process regression fitted to the individual run results, with the shaded areas representing $95 \%$ confidence intervals of the predictive distribution. If a parameter was not optimized, the methods average performance is depicted by a dashed horizontal line. Lower values indicate better performance for all metrics but PC. NLL = Negative Log Likelihood. NRMSE = Normalized Root Mean Squared Error. ECE = Expected Calibration Error. PC = Posterior Contraction. Whereas learning rate optimization is mostly ineffective for improving performance under contamination misspecification, the domain alignment regularization parameter $\lambda$ controls a trade-off between error (NRMSE) vs. calibration (ECE) and contraction (PC) for NPE-UDA methods.
  • Figure 4: Experiment 1. Further metrics resulting from $50$ separate Bayesian hyperparameter optimization runs per method. The solid trend lines represent the predictive mean of a Gaussian process regression fitted to the individual run results, with the shaded areas representing $95 \%$ confidence intervals of the predictive distribution. If a parameter was not optimized, the methods average performance is depicted by a dashed horizontal line. Lower values indicate better performance for all metrics but SSDD. PPD = Posterior Predictive Distance (RMSE). INLD = Inference Network Latent Distance (MMD). SSDD = Summary Space Domain Distance.
  • Figure 5: Experiment 2. Performance metrics and summary space domain distance (SSDD) of the methods in all misspecification scenarios (columns), aggregated via the median of $10$ runs. The first row shows the well-specified setting, with misspecification increasing from top to bottom within each column. Metric values are centered at $0$ and normalized by each column's/scenario's maximum value, which is displayed below the metric name at the border of each radar plot. Lower values indicate better performance for all metrics but SSDD. $1 -$ PC $=$$1 -$ Posterior Contraction. NRMSE = Normalized Root Mean Squared Error. SSDD (MMD) = Summary Space Domain Distance measured via MMD (not applicable for Analytic Posterior). PPD (resim) = Posterior Predictive Distance measured via the RMSE to resimulated data. ECE = Expected Calibration Error. NPE-UDA methods fail under prior misspecification but can be advantageous under contamination.
  • ...and 14 more figures

Theorems & Definitions (1)

  • Definition 1