Does Unsupervised Domain Adaptation Improve the Robustness of Amortized Bayesian Inference? A Systematic Evaluation
Lasse Elsemüller, Valentin Pratz, Mischa von Krause, Andreas Voss, Paul-Christian Bürkner, Stefan T. Radev
TL;DR
This work systematically assesses unsupervised domain adaptation (UDA) for amortized Bayesian inference (ABI), introducing two NPE-UDA variants (NPE-MMD and NPE-DANN) that align simulated and observed summary statistics. By formulating a trade-off objective that adds a domain-alignment term with weight $\lambda$, the study reveals that UDA can reduce extrapolation bias under likelihood misspecification but often harms performance under prior misspecification, with sensitivity to dimensionality and training dynamics. Across four experiments—Ricker, a 2D Gaussian benchmark, high-dimensional Bayesian denoising, and a large real-world diffusion-model task on IAT data—the results show nuanced, dataset-dependent outcomes: domain alignment improves some metrics like calibration and hidden-domain matching, yet can cause instability or information loss when misalignment stems from prior misspecification. The findings emphasize careful consideration of misspecification types and hyperparameter tuning, and highlight the need for robust, interpretable evaluation protocols when applying NPE-UDA in real-world, high-stakes contexts.
Abstract
Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in silico and practice. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA to increase the robustness of ABI.
