Consistency Models for Scalable and Fast Simulation-Based Inference

Marvin Schmitt; Valentin Pratz; Ullrich Köthe; Paul-Christian Bürkner; Stefan T Radev

Consistency Models for Scalable and Fast Simulation-Based Inference

Marvin Schmitt, Valentin Pratz, Ullrich Köthe, Paul-Christian Bürkner, Stefan T Radev

TL;DR

CMPE is a new conditional sampler for SBI that inherits the advantages of recent unconstrained architectures and overcomes their sampling inefficiency at inference time, and provides hyperparameters and default architectures that support consistency training over a wide range of different dimensions.

Abstract

Simulation-based inference (SBI) is constantly in search of more expressive and efficient algorithms to accurately infer the parameters of complex simulation models. In line with this goal, we present consistency models for posterior estimation (CMPE), a new conditional sampler for SBI that inherits the advantages of recent unconstrained architectures and overcomes their sampling inefficiency at inference time. CMPE essentially distills a continuous probability flow and enables rapid few-shot inference with an unconstrained architecture that can be flexibly tailored to the structure of the estimation problem. We provide hyperparameters and default architectures that support consistency training over a wide range of different dimensions, including low-dimensional ones which are important in SBI workflows but were previously difficult to tackle even with unconditional consistency models. Our empirical evaluation demonstrates that CMPE not only outperforms current state-of-the-art algorithms on hard low-dimensional benchmarks, but also achieves competitive performance with much faster sampling speed on two realistic estimation problems with high data and/or parameter dimensions.

Consistency Models for Scalable and Fast Simulation-Based Inference

TL;DR

Abstract

Paper Structure (35 sections, 13 equations, 16 figures, 4 tables)

This paper contains 35 sections, 13 equations, 16 figures, 4 tables.

Introduction
Preliminaries and related work
Notation
Simulation-based inference (SBI)
Normalizing flows for neural posterior estimation
Flow matching for posterior estimation
Neural posterior score estimation
Consistency model posterior estimation
Conditional consistency models
Consistency models for simulation-based inference
Optimization objective
Hyperparameter tuning
Density estimation
Choosing the number of sampling steps
Empirical evaluation
...and 20 more sections

Figures (16)

Figure 1: Experiments 1--3. 1000 posterior draws for one unseen test instance per task, as well as sampling time in milliseconds. All amortized neural approximators were trained with a small budget of $M=1024$ simulations. The bottom row shows the posterior predictive distribution in the kinematics task, and the pink cross-hair indicates the true end location $\mathbf{x}$ of the robot arm. Across all benchmarks, CMPE (Ours) yields the best trade-off between fast sampling speed and high accuracy. ACF: affine coupling flow, NSF: neural spline flow, FMPE: flow matching posterior estimation, CMPE: consistency model posterior estimation (Ours), K# denotes $K$ sampling steps during inference.
Figure 2: C2ST score of 4000 approximate posterior draws vs. reference posterior (lower is better) for $J=100$ unseen test examples. (\ref{['fig:speed-c2st-gmm']}) CMPE (Ours) outperforms all other methods through both faster and more accurate inference on the GMM benchmark (mean$\pm$SD). (\ref{['fig:budget-c2st-twomoons']}) CMPE (Ours) with 10 sampling steps shows superior performance up to a training budget of 4096 instances on the Two Moons benchmark (mean$\pm$SE).
Figure 3: Experiment 4. CMPE denoising results on Fashion MNIST (U-Net backbone, $K=2$ sampling steps, 60 000 training images). First row: Original image (target parameters $\boldsymbol{\theta}$). Second row: Blurred image (observations $\mathbf{x}$). Third and fourth row: Means and standard deviations of the approximate posteriors. Note: For standard deviations, darker regions indicate larger variability in the outputs. Adapted from radev2023jana. More in Appendix \ref{['sec:appendix-denoising']}.
Figure 4: C2ST score of 4000 approximate posterior draws vs. reference posterior (lower is better), we report mean$\pm$SE over $J=100$ unseen test examples. (\ref{['fig:budget-c2st-gmm']}) For the GMM benchmark, we observe an unexpected pattern for the dependency on the training budget. The C2ST score increases or stays approximately constant across all methods, indicating that in this regime a higher training budget leads to inferior performance, for example due to a tendency to overfit with more training data. It could also be a sign that the C2ST is not a good quality metric for this benchmark, but the monotonically decreasing curve for higher-quality samples (i.e., more sampling steps) for FMPE in \ref{['fig:speed-c2st-gmm']} indicates that the behavior can probably be attributed to the training budget and not to the metric. See \ref{['sec:metrics']} for a more detailed discussion. (\ref{['fig:budget-c2st-invkinematics']}) In this task, it is more challenging to achieve excellent C2ST scores because there is no aleatoric uncertainty in the data-generating process. CMPE outperforms ACF and NSF. FMPE performs best and can benefit most from the increased training budget.
Figure 5: Experiment 1-3. C2ST score of 4000 approximate posterior draws vs. reference posterior (lower is better), we report mean$\pm$SE over $J=100$ unseen test examples at different numbers of inference steps. The minimum C2ST value is achieved around 10-20 inference steps for every benchmark, after which the value increases again.
...and 11 more figures

Consistency Models for Scalable and Fast Simulation-Based Inference

TL;DR

Abstract

Consistency Models for Scalable and Fast Simulation-Based Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (16)