Simultaneous identification of models and parameters of scientific simulators
Cornelius Schröder, Jakob H. Macke
TL;DR
This work addresses the challenge of identifying both the component structure and parameters of compositional scientific simulators under likelihood-free settings. It introduces Simulation-Based Model Inference (SBMI), which jointly infers $p(M|x)$ and $p(\theta|M,x)$ using amortized neural networks: a conditional mixture of Grassmann distributions (MoGr) for model posteriors and a marginalized Gaussian Mixture Density Network for parameter posteriors, with a graph-based model prior guiding component inclusion. The method is demonstrated on additive, drift-diffusion, and Hodgkin-Huxley models, revealing multiple data-consistent configurations, exposing non-identifiable components, and delivering calibrated predictive posteriors and interpretable interactions between model components. These results enable data-driven, uncertainty-aware comparisons over model compositions and support principled domain knowledge integration in complex scientific modeling. SBMI’s amortized framework facilitates rapid inference on new data and can be extended to varying-output simulators and symbolic regression-like analyses, with potential broad impact across sciences where modular, interacting mechanisms underlie observed phenomena.
Abstract
Many scientific models are composed of multiple discrete components, and scientists often make heuristic decisions about which components to include. Bayesian inference provides a mathematical framework for systematically selecting model components, but defining prior distributions over model components and developing associated inference schemes has been challenging. We approach this problem in a simulation-based inference framework: We define model priors over candidate components and, from model simulations, train neural networks to infer joint probability distributions over both model components and associated parameters. Our method, simulation-based model inference (SBMI), represents distributions over model components as a conditional mixture of multivariate binary distributions in the Grassmann formalism. SBMI can be applied to any compositional stochastic simulator without requiring likelihood evaluations. We evaluate SBMI on a simple time series model and on two scientific models from neuroscience, and show that it can discover multiple data-consistent model configurations, and that it reveals non-identifiable model components and parameters. SBMI provides a powerful tool for data-driven scientific inquiry which will allow scientists to identify essential model components and make uncertainty-informed modelling decisions.
