Delta-AI: Local objectives for amortized inference in sparse graphical models

Jean-Pierre Falet; Hae Beom Lee; Nikolay Malkin; Chen Sun; Dragos Secrieru; Thomas Jiralerspong; Dinghuai Zhang; Guillaume Lajoie; Yoshua Bengio

Delta-AI: Local objectives for amortized inference in sparse graphical models

Jean-Pierre Falet, Hae Beom Lee, Nikolay Malkin, Chen Sun, Dragos Secrieru, Thomas Jiralerspong, Dinghuai Zhang, Guillaume Lajoie, Yoshua Bengio

TL;DR

Δ-AI introduces a local, structure-aware objective for amortized inference in sparse PGMs by enforcing local equality of conditional distributions between a Markov network and a chordal Bayesian network. By training a single parametric sampler with losses that depend only on a variable and its Markov blanket, it achieves fast, off-policy learning and can amortize over multiple DAG orders. The approach yields faster wall-clock convergence than traditional GFlowNets and outperforms unstructured amortized methods and MCMC in synthetic experiments, while enabling partial-subset inference in real-data VAEs. The work also connects to continuous-space score matching and offers a bilevel training framework where the amortized sampler aids the training of energy-based or latent-variable models. Overall, Δ-AI provides a scalable, locality-driven path to accurate amortized inference in sparse graphical models with practical implications for deep generative modeling and structure-aware learning.

Abstract

We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $Δ$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $Δ$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.

Delta-AI: Local objectives for amortized inference in sparse graphical models

TL;DR

Abstract

We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call

-amortized inference (

-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The

-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate

-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.

Paper Structure (60 sections, 1 theorem, 22 equations, 13 figures, 1 algorithm)

This paper contains 60 sections, 1 theorem, 22 equations, 13 figures, 1 algorithm.

Introduction
Background
Probabilistic graphical models
Undirected graphical models (Markov networks / factor graphs).
Directed graphical models (Bayesian networks).
From Bayesian networks to Markov networks and back.
Training and amortization in PGMs
Maximum-likelihood training of PGMs.
Amortized inference.
Amortized inference with generative flow networks
Trajectory balance.
Detailed balance.
Local constraints for matching Markov and Bayesian networks
Setting.
$\Delta$-AI constraint.
...and 45 more sections

Key Result

Proposition 1

Suppose $n>1$. Let $S = \sum_{i=1}^n f_i$ and $L=\frac{1}{2}(g + S)^2$, where $g$ and each $f_i$ are functions of $\theta$. Let with $\bar{g}$ and $\bar{f}_i$ indicating that gradients are blocked, i.e., $\frac{\partial \bar{g}}{\partial \theta}=0$ and $\frac{\partial \bar{f}_i}{\partial \theta}=0$. Then where $\mathbb{E}_i[]$ is a uniform average over indices $i$ in $\{1,\dots,n\}$ and $\mathbb

Figures (13)

Figure 1: Summary of the relationships between undirected graphs defining Markov networks (left) and DAGs defining Bayesian networks (right). Chordalization strictly relaxes the conditional independence constraints on a Markov network, while Markov networks with respect to a chordal graph and Bayesian networks with respect to its P-map are equivalent.
Figure 2: Generating and amortizing multiple DAG orders. The conditionals present two I-maps (DAGs) for the same undirected model $p$ are different: for example, the conditional $p(v_1\mid v_2,v_3)$ appears in the two DAGs in the second row, but not in those in the first row. $\Delta$-AI learns a model $q$ that matches the conditionals in the target distribution $p$. If $q$ has a structure that allows taking varying subsets of variables as input, then it can be trained to match the conditionals appearing in multiple DAG structures simultaneously, and the resulting sampler can then be used for sampling in any of these DAGs.
Figure 3: Graphical models. (a) and (b) are UGMs for Ising models and (c) shows the factor graph model, where each factor is a small randomly initialized MLP with four arguments. (a) is chordal and (b,c) are non-chordal.
Figure 4: Comparison of $\Delta$-AI and GFlowNets. $\Delta$-AI converges to the target distribution fastest.
Figure 5: Comparison against MCMC. (a, b) chordal graph, (c, d) non-chordal graph. $\Delta$-AI provides a substantial amortization benefit, with training time smaller than the mixing time of MCMC chains.
...and 8 more figures

Theorems & Definitions (2)

Proposition 1
proof

Delta-AI: Local objectives for amortized inference in sparse graphical models

TL;DR

Abstract

Delta-AI: Local objectives for amortized inference in sparse graphical models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (2)