Table of Contents
Fetching ...

Flow Matching for Robust Simulation-Based Inference under Model Misspecification

Pierre-Louis Ruhlmann, Pedro L. C. Rodrigues, Michael Arbel, Florence Forbes

TL;DR

This work introduces Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of real calibration samples and consistently mitigates the effects of misspecification.

Abstract

Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification: simulators are only approximations of reality, and mismatches between simulated and real data can yield biased or overconfident posteriors. We address this issue by introducing Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of real calibration samples. Our approach proceeds in two stages: first, a posterior approximator is trained on abundant simulated data; second, flow matching transports its predictions toward the true posterior supported by real observations, without requiring explicit knowledge of the misspecification. This design enables FMCPE to combine the scalability of SBI with robustness to distributional shift. Across synthetic benchmarks and real-world datasets, we show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty calibration compared to standard SBI baselines, while remaining computationally efficient.

Flow Matching for Robust Simulation-Based Inference under Model Misspecification

TL;DR

This work introduces Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of real calibration samples and consistently mitigates the effects of misspecification.

Abstract

Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification: simulators are only approximations of reality, and mismatches between simulated and real data can yield biased or overconfident posteriors. We address this issue by introducing Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of real calibration samples. Our approach proceeds in two stages: first, a posterior approximator is trained on abundant simulated data; second, flow matching transports its predictions toward the true posterior supported by real observations, without requiring explicit knowledge of the misspecification. This design enables FMCPE to combine the scalability of SBI with robustness to distributional shift. Across synthetic benchmarks and real-world datasets, we show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty calibration compared to standard SBI baselines, while remaining computationally efficient.

Paper Structure

This paper contains 17 sections, 18 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: Overview of FMCPE. The method combines two complementary flow matching steps to correct simulation-based posterior distributions under model misspecification (represented by $\hat{p}_{\bm{\Theta}|\bm{X}}(\bm{\theta}|\bm{y})$ in grey level sets). (1) Scarce calibration data $(\bm{\theta}, \bm{y})$ are used to learn a transport map $T_{\bm{X}}$ that couples real observations $\bm{y}$ with surrogate counterparts $\tilde{\bm{x}}$ lying in the simulator's domain. (2) We then learn $T_{\bm{\Theta}}$ to transport samples from a $q_{\bm{X}|\bm{Y}}$-weighted version of the simulation-based posterior $\hat{p}_{\bm{\Theta}|\bm{X}}(\bm{\theta}|\tilde{\bm{x}})$ toward the final corrected posterior $\hat{p}_{\bm{\Theta}|\bm{Y}}$. Note that both transports are required: ${T}_{\bm{X}}$ addresses the mismatches between simulated data and real observations, while $T_{\bm{\Theta}}$ refines parameter inference to align with the true posterior.
  • Figure 2: Wasserstein distance (top row, $\boldsymbol{\downarrow}$ is better) and $j$C2ST (bottom row, $\boldsymbol{\downarrow}$ is better) with respect to an increasing calibration set size $N_{\text{cal}}\in \{10,50,200,1000\}$. Each boxplot shows the distribution of metric values across five independent runs, each using a different randomly chosen calibration set.
  • Figure 3: Kernel density estimates of joint and marginal samples for Tasks B (first row) and A (second row). For a given $\bm{y}^* \!\!\in {\cal D}_{\text{test}}$, we draw $\{\tilde{\bm{\theta}}_i\}_{1\leq i \leq 2000}$, for each method and 3 calibration sizes $N_{\text{cal}} \!\in\!\{ 10,50,200\}$. Dotted black lines indicate the true parameter ${\bm{\theta}}^*$ that generated ${\bm{y}}^*$.
  • Figure 4: MSE with respect to an increasing calibration set size $N_{\text{cal}}\in \{10,50,200,1000\}$. Each boxplot shows the distribution of MSE values across five independent runs, each using a different randomly chosen calibration set.
  • Figure 5: Kernel density estimates of the learned posteriors for task WindTunnel. For a given $\bm{y}^* \!\!\in \!{\cal D}_{\text{test}}$, we draw $\{\tilde{\bm{\theta}}_i\}_{1\leq i \leq 2000}$, for each method and 3 calibration sizes $N_{\text{cal}} \!\in\!\{ 10,50,200\}$. The dotted black line indicates the true parameter ${\bm{\theta}}^*$ that generated ${\bm{y}}^*$.
  • ...and 1 more figures