Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps

Evgenii Egorov; Ricardo Valperga; Efstratios Gavves

Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps

Evgenii Egorov, Ricardo Valperga, Efstratios Gavves

TL;DR

Ai-Sampler proposes an adversarial MCMC framework in which transition kernels are parameterized by involutive maps built from time-reversible neural networks, ensuring detailed balance by construction. The core idea is to train a deterministic involutive proposal via a discriminator that approximates the density ratio, with a bootstrap process to progressively improve sampling quality; the objective upper-bounds the total variation distance to the target using Pinsker’s inequality. A $C_2$-equivariant discriminator enforces symmetry under the involution, and two discriminator parameterizations are offered: a product form and a more general linear–nonlinear composition. Empirical results on 2D multimodal densities and Bayesian logistic regression demonstrate competitive ESS and favorable running times, with strong mixing and scalability on accelerators, highlighting Ai-Sampler as a robust alternative to baselines like HMC and NICE-based methods.

Abstract

Markov chain Monte Carlo methods have become popular in statistics as versatile techniques to sample from complicated probability distributions. In this work, we propose a method to parameterize and train transition kernels of Markov chains to achieve efficient sampling and good mixing. This training procedure minimizes the total variation distance between the stationary distribution of the chain and the empirical distribution of the data. Our approach leverages involutive Metropolis-Hastings kernels constructed from reversible neural networks that ensure detailed balance by construction. We find that reversibility also implies $C_2$-equivariance of the discriminator function which can be used to restrict its function space.

Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps

TL;DR

-equivariant discriminator enforces symmetry under the involution, and two discriminator parameterizations are offered: a product form and a more general linear–nonlinear composition. Empirical results on 2D multimodal densities and Bayesian logistic regression demonstrate competitive ESS and favorable running times, with strong mixing and scalability on accelerators, highlighting Ai-Sampler as a robust alternative to baselines like HMC and NICE-based methods.

Abstract

-equivariance of the discriminator function which can be used to restrict its function space.

Paper Structure (38 sections, 1 theorem, 37 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 38 sections, 1 theorem, 37 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Parametrizing Kernels with Involutions
Parameterizing Involutions
Time-reversible neural networks
Reversing symmetries in Hamiltonian MC kernels.
Decomposing and parameterizing reversibile maps.
Involutive MCMC kernels by construction.
Adversarial Training for Involutive Kernel
Bootstrap
The adversarsial MH kernel
Equivariance of the discriminator under $R \circ L_\theta$
Discriminator parametrization
Discriminator with product parameterization.
$C_2$-equivariant composition of linear maps and non-linear activatons.
...and 23 more sections

Key Result

Theorem 4.1

valperga2022learning Let $L : \mathbb{R}^{D}\to\mathbb{R}^{D}$ be an $R$-reversible diffeomorphismIt must be smoothly isotopic to the identity, a mild condition for sufficiently well-behaved target functions., with $R$ being a linear involution. Then, there exists a unique diffeomorphism $g: \mathbb

Figures (8)

Figure 1: Schematic representation of an involution constructed from a time-reversible diffeomorphism $L$. For an $R$-reversible diffeomorphism $L$, with $R: (q, p) \mapsto (q, -p)$, the composition $R\circ L\circ R\circ L$ is the identity.
Figure 2: Synthetic 2D densities used in the experiments. From left to right: mog2, mog6, ring, and ring5. Top row: true density. Bottom row: KDE with samples from our Ai-sampler.
Figure 3: Single MCMC trajectory with the learned kernel (top) on the mog2 and mog6 synthetic 2D densities, compared to HMC (bottom). The low density regions make it unlikely for HMC to get from one mode to another.
Figure 4: Discriminator as a function of two inputs as in Eq. \ref{['eq:two-input-discriminator']}, for three different values of x: one far from the six modes and two at the center of one mode.
Figure 5: Time vs. number of parallel chains for a single RTX3090 GPU, sampling from the Bayesian logistic regression posterior with German dataset. Every chain consists of 100 steps. For more more than $10^{4}$ parallel chains, the jitting time becomes prohibitively long and therefore not worthy.
...and 3 more figures

Theorems & Definitions (3)

Definition 3.1
Theorem 4.1
Definition 5.1

Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps

TL;DR

Abstract

Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)