Table of Contents
Fetching ...

Inductive Domain Transfer In Misspecified Simulation-Based Inference

Ortal Senouf, Antoine Wehenkel, Cédric Vincent-Cuaz, Emmanuel Abbé, Pascal Frossard

TL;DR

This paper tackles misspecified simulation-based inference by moving from RoPE's transductive domain transfer to FRISBI, a fully inductive and amortized framework. FRISBI jointly trains a mini-batch OT-based alignment with a supervised calibration objective and then amortizes the resulting OT-induced posterior using a conditional normalizing flow, enabling test-time inference without access to simulations. Across synthetic and real benchmarks, including complex medical biomarker estimation, FRISBI matches or surpasses RoPE and standard SBI methods in both accuracy (LPP) and calibration (ACAUC), while offering superior scalability and applicability in misspecified environments. The approach improves robustness to limited calibration data and label noise, highlighting its practical impact for scalable, reliable SBI in real-world, imperfect models.

Abstract

Simulation-based inference (SBI) is a statistical inference approach for estimating latent parameters of a physical system when the likelihood is intractable but simulations are available. In practice, SBI is often hindered by model misspecification--the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, a recent SBI approach, addresses this challenge through a two-stage domain transfer process that combines semi-supervised calibration with optimal transport (OT)-based distribution alignment. However, RoPE operates in a fully transductive setting, requiring access to a batch of test samples at inference time, which limits scalability and generalization. We propose here a fully inductive and amortized SBI framework that integrates calibration and distributional alignment into a single, end-to-end trainable model. Our method leverages mini-batch OT with a closed-form coupling to align real and simulated observations that correspond to the same latent parameters, using both paired calibration data and unpaired samples. A conditional normalizing flow is then trained to approximate the OT-induced posterior, enabling efficient inference without simulation access at test time. Across a range of synthetic and real-world benchmarks--including complex medical biomarker estimation--our approach matches or surpasses the performance of RoPE, as well as other standard SBI and non-SBI estimators, while offering improved scalability and applicability in challenging, misspecified environments.

Inductive Domain Transfer In Misspecified Simulation-Based Inference

TL;DR

This paper tackles misspecified simulation-based inference by moving from RoPE's transductive domain transfer to FRISBI, a fully inductive and amortized framework. FRISBI jointly trains a mini-batch OT-based alignment with a supervised calibration objective and then amortizes the resulting OT-induced posterior using a conditional normalizing flow, enabling test-time inference without access to simulations. Across synthetic and real benchmarks, including complex medical biomarker estimation, FRISBI matches or surpasses RoPE and standard SBI methods in both accuracy (LPP) and calibration (ACAUC), while offering superior scalability and applicability in misspecified environments. The approach improves robustness to limited calibration data and label noise, highlighting its practical impact for scalable, reliable SBI in real-world, imperfect models.

Abstract

Simulation-based inference (SBI) is a statistical inference approach for estimating latent parameters of a physical system when the likelihood is intractable but simulations are available. In practice, SBI is often hindered by model misspecification--the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, a recent SBI approach, addresses this challenge through a two-stage domain transfer process that combines semi-supervised calibration with optimal transport (OT)-based distribution alignment. However, RoPE operates in a fully transductive setting, requiring access to a batch of test samples at inference time, which limits scalability and generalization. We propose here a fully inductive and amortized SBI framework that integrates calibration and distributional alignment into a single, end-to-end trainable model. Our method leverages mini-batch OT with a closed-form coupling to align real and simulated observations that correspond to the same latent parameters, using both paired calibration data and unpaired samples. A conditional normalizing flow is then trained to approximate the OT-induced posterior, enabling efficient inference without simulation access at test time. Across a range of synthetic and real-world benchmarks--including complex medical biomarker estimation--our approach matches or surpasses the performance of RoPE, as well as other standard SBI and non-SBI estimators, while offering improved scalability and applicability in challenging, misspecified environments.

Paper Structure

This paper contains 27 sections, 12 equations, 7 figures, 1 table, 2 algorithms.

Figures (7)

  • Figure 1: FRISBI Overview. Similar to RoPE wehenkel2024addressing, we assume a trained neural statistics encoder (NSE), $h_{\omega^\star}$, that maps simulation data ${\bm{x}}_s$ to embeddings $h_{\omega^\star}({\bm{x}}_s)$, and a neural posterior estimator (NPE), $q_{\psi^\star}$, which estimates simulated posterior distributions. FRISBI performs: (1) Joint optimal transport (OT) and supervised learning. Both paired and unpaired samples contribute to the OT plan (dashed lines), weighted by $\alpha_j$. Supervised samples from $\mathcal{D}_{calib}$ (solid lines) anchor the OT matching. The real-observations encoder $g_{\phi}$ is fine-tuned to optimize representations for both supervised learning and OT-based domain transfer. (2) A conditional density estimator, $q_{\xi}$, approximates the posterior arising from the OT-based mixture of posteriors.
  • Figure 2: Results across different calibration set sizes.. The top row displays performance in terms of LPP ($\uparrow$) while the bottom one is the calibration metric ACAUC ($\rightarrow 0 \leftarrow$). The horizontal axis indicates the sample size while the vertical one is the metric value. Baselines that do not rely on a calibration set are represented by fixed horizontal dashed lines for easier comparison.
  • Figure 3: Ablation Analysis. Comparison of joint training only (\ref{['subsec:joint']}), solution amortization only (\ref{['subsec:inductive amortisation']}), and the full pipeline. The horizontal axis shows the number of calibration samples, while the vertical axis represents the LPP($\uparrow$, top) and ACAUC($\rightarrow0 \leftarrow$, bottom) scores.
  • Figure 4: Label Noise Robustness. Impact of increasing label noise on performance, measured by LPP ($\uparrow$, top) and ACAUC ($\rightarrow 0 \leftarrow$, bottom), across three calibration set sizes (noted above each panel) on the light tunnel benchmark. The horizontal axis represents the noise rate, while the vertical axis shows the metric score.
  • Figure 5: Cardiac Biomarker Estimation. Performance comparison across all baselines for heart rate (HR) and cardiac output (CO) estimation, using a calibration set of 200 samples. On the right, an example of real and simulated arterial pulse waveforms is shown.
  • ...and 2 more figures