Contrastive Neural Ratio Estimation for Simulation-based Inference
Benjamin Kurt Miller, Christoph Weniger, Patrick Forré
TL;DR
This work introduces nre-c, a generalization of likelihood-to-evidence ratio estimation for simulation-based inference that avoids the bias inherent to the multiclass NRE-B setup. By adding an independent class and carefully designing the loss, nre-c eliminates the $c_{w}(x)$ bias at optimum and naturally recovers NRE-A and NRE-B in corner cases, while enabling informative diagnostics. The authors propose mutual-information bounds and an importance-sampling diagnostic to assess ratio quality, and validate performance across unlimited data, fast-prior drawing, and the sbibm benchmark, demonstrating superior posterior accuracy and robust diagnostics with $K>1$ and $\gamma\approx1$. They also show that normalizing the posterior is feasible within finite data regimes, and that mutual-information-based metrics can guide model selection without ground-truth posteriors. Overall, nre-c offers a principled, diagnostically verifiable, and scalable approach for amortized SBI across diverse data regimes. $r(\boldsymbol{x}|\boldsymbol{\theta})$ is estimated via a multiclass classifier, with a bias-free optimum simplifying normalization and enabling reliable diagnostics in practical SBI applications.
Abstract
Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an intrinsic and unknown bias term, making otherwise informative diagnostics unreliable. We propose a multiclass framework free from the bias inherent to NRE-B at optimum, leaving us in the position to run diagnostics that practitioners depend on. It also recovers NRE-A in one corner case and NRE-B in the limiting case. For fair comparison, we benchmark the behavior of all algorithms in both familiar and novel training regimes: when jointly drawn data is unlimited, when data is fixed but prior draws are unlimited, and in the commonplace fixed data and parameters setting. Our investigations reveal that the highest performing models are distant from the competitors (NRE-A, NRE-B) in hyperparameter space. We make a recommendation for hyperparameters distinct from the previous models. We suggest two bounds on the mutual information as performance metrics for simulation-based inference methods, without the need for posterior samples, and provide experimental results. This version corrects a minor implementation error in $γ$, improving results.
