Table of Contents
Fetching ...

Variational Transdimensional Inference

Laurence Davies, Dan Mackinlay, Rafael Oliveira, Scott A. Sisson

TL;DR

This work tackles variational inference over transdimensional spaces by introducing CoSMIC, a contextually masked normalizing flow that enables a single variational density to approximate a transdimensional posterior $\pi(m,\boldsymbol{\theta}_m)$. A dimension-saturation trick augments all model-specific parameter spaces to a common dimension $d_{\max}$ with auxiliary variables, enabling exact factorization and tractable training. The authors present two training pathways: a GP surrogate-based approach with UCB to approximate model weights and scalable parametric mass functions via MADE for large model spaces, plus Monte Carlo gradient estimators with variance reduction for discrete variables. They provide theoretical convergence guarantees under sub-Gaussian noise and demonstrate strong performance on robust variable selection and non-linear DAG discovery with high-cardinality model spaces, indicating practical applicability to model selection and structure learning tasks. Overall, the method broadens the applicability of flow-based VI to transdimensional problems and offers scalable strategies for complex model spaces.

Abstract

The expressiveness of flow-based models combined with stochastic variational inference (SVI) has expanded the application of optimization-based Bayesian inference to highly complex problems. However, despite the importance of multi-model Bayesian inference for problems defined on a transdimensional joint model and parameter space, such as Bayesian structure learning and model selection, flow-based SVI has been limited to problems defined on a fixed-dimensional parameter space. We introduce CoSMIC, normalizing flows (COntextually-Specified Masking for Identity-mapped Components), an extension to neural autoregressive conditional normalizing flow architectures that enables use of a single flow-based variational density for inference over a transdimensional (multi-model) conditional target distribution. We propose a combined stochastic variational transdimensional inference (VTI) approach to training CoSMIC, flows using ideas from Bayesian optimization and Monte Carlo gradient estimation. Numerical experiments show the performance of VTI on challenging problems that scale to high-cardinality model spaces.

Variational Transdimensional Inference

TL;DR

This work tackles variational inference over transdimensional spaces by introducing CoSMIC, a contextually masked normalizing flow that enables a single variational density to approximate a transdimensional posterior . A dimension-saturation trick augments all model-specific parameter spaces to a common dimension with auxiliary variables, enabling exact factorization and tractable training. The authors present two training pathways: a GP surrogate-based approach with UCB to approximate model weights and scalable parametric mass functions via MADE for large model spaces, plus Monte Carlo gradient estimators with variance reduction for discrete variables. They provide theoretical convergence guarantees under sub-Gaussian noise and demonstrate strong performance on robust variable selection and non-linear DAG discovery with high-cardinality model spaces, indicating practical applicability to model selection and structure learning tasks. Overall, the method broadens the applicability of flow-based VI to transdimensional problems and offers scalable strategies for complex model spaces.

Abstract

The expressiveness of flow-based models combined with stochastic variational inference (SVI) has expanded the application of optimization-based Bayesian inference to highly complex problems. However, despite the importance of multi-model Bayesian inference for problems defined on a transdimensional joint model and parameter space, such as Bayesian structure learning and model selection, flow-based SVI has been limited to problems defined on a fixed-dimensional parameter space. We introduce CoSMIC, normalizing flows (COntextually-Specified Masking for Identity-mapped Components), an extension to neural autoregressive conditional normalizing flow architectures that enables use of a single flow-based variational density for inference over a transdimensional (multi-model) conditional target distribution. We propose a combined stochastic variational transdimensional inference (VTI) approach to training CoSMIC, flows using ideas from Bayesian optimization and Monte Carlo gradient estimation. Numerical experiments show the performance of VTI on challenging problems that scale to high-cardinality model spaces.

Paper Structure

This paper contains 60 sections, 9 theorems, 89 equations, 11 figures, 2 tables, 4 algorithms.

Key Result

Lemma 2.1

For a CoSMIC transform $(\boldsymbol{\theta}_{m},\boldsymbol{u}_{\setminus m})=T_\phi(\boldsymbol{z}_m,\boldsymbol{z}_{\setminus m})$, $\boldsymbol{u}_{\setminus m}=\boldsymbol{z}_{\setminus m}$$\forall m\in\pazocal{M}$.

Figures (11)

  • Figure 1: (a) CoSMIC flow composition, (b) Context-to-mask map, (c) A single CoSMIC IAF step.
  • Figure 2: Quality of VTI approximation for Bayesian misspecified robust variable selection. Outer columns denote medium (left) or high (right) likelihood misspecificaton, inner columns indicate different normalizing flow constructions, increasing flow expressivity from left to right. Flow types are described in Appendix \ref{['apdx:flowarchitecture']}. Top row: Estimated model probabilities $q_{\psi}(m)$ vs true model probabilities $\pi(m)$ on the log scale. Bottom row: Cross entropy between individual model estimates $q_{\phi}(\boldsymbol{\theta}_{m}|m)$ and true density $\pi(\boldsymbol{\theta}_{m}|m)$ versus true model probability. Colors indicate 10 replicated analyses, each with $|\pazocal{M}|=2^7$ models.
  • Figure 3: Left: A simulation study of the robust variable selection example showing the cross entropy (NLL) between RJMCMC samples and an flow-based variational transdimensional density using rational quadratic spline CoSMIC flows under a fixed number of iterations (30,000). Each cardinality was run with 10 independently sampled synthetic data sets. Right: Comparison of bivariate plots of variables ${\boldsymbol{\theta}_{m}^{(1)},\boldsymbol{\theta}_{m}^{(5)}}$ obtained by RJMCMC and VTI for a single ${|\pazocal{M}|=2^7}$ problem.
  • Figure 4: Simulation study comparing VTI to DAGMA Bello2022DAGMA, DiBS/DiBS+ Lorch2021DiBS, and JSP-GFlowNets Deleu2023Joint for discovery of a 10-node non-linear DAG visualized using standard metrics (Appendix \ref{['apdx:dagmetrics']}, left to right, where better is: higher, lower, lower, higher). Bars display mean and standard error over nine i.i.d. repetitions for each data set size.
  • Figure 5: As Figure \ref{['fig:robustvsmodelprob']} (main text), but under: no misspecification ($\sigma_1=1,\sigma_2=10$), focused prior ($\sigma_\beta=1.5$). Circles indicate the null model (constant only, no predictors); triangles indicate the data generating process.
  • ...and 6 more figures

Theorems & Definitions (19)

  • Lemma 2.1
  • Proposition 2.2
  • Corollary 2.3
  • Corollary 3.1
  • proof : Proof of \ref{['lemma:identity-map']}
  • proof : Proof of \ref{['prop:cosmic-factorisation']}
  • proof : Proof of \ref{['prop:cancellation']}
  • Corollary B.1: Computational complexity
  • proof
  • Lemma C.1
  • ...and 9 more