Optimising two-block averaging kernels to speed up Markov chains

Ryan J. Y. Lim; Michael C. H. Choi

Optimising two-block averaging kernels to speed up Markov chains

Ryan J. Y. Lim, Michael C. H. Choi

TL;DR

Several algorithmic approximations, including majorisation-minimisation and coordinate descent schemes, are proposed as computationally feasible alternatives to exhaustive combinatorial search and reveal that optimal cuts under the two objectives can substantially reduce total variation distance to stationarity.

Abstract

We study the problem of selecting optimal two-block partitions to accelerate the mixing of finite Markov chains under group-averaging transformations. The main objectives considered are the Kullback-Leibler (KL) divergence and the Frobenius distance to stationarity. We establish explicit connections between these objectives and the induced projection chain. In the case of the KL divergence, this reduction yields explicit decay rates in terms of the log-Sobolev constant. For the Frobenius distance, we identify a Cheeger-type functional that characterises optimal cuts. This formulation recasts two-block selection as a structured combinatorial optimisation problem admitting difference-of-submodular decompositions. We further propose several algorithmic approximations, including majorisation-minimisation and coordinate descent schemes, as computationally feasible alternatives to exhaustive combinatorial search. Our numerical experiments reveal that optimal cuts under the two objectives can substantially reduce total variation distance to stationarity and demonstrate the practical effectiveness of the proposed approximation algorithms.

Optimising two-block averaging kernels to speed up Markov chains

TL;DR

Abstract

Paper Structure (28 sections, 40 theorems, 188 equations, 5 figures, 2 tables)

This paper contains 28 sections, 40 theorems, 188 equations, 5 figures, 2 tables.

Introduction
Related works
Preliminaries and definitions
Eigenvalues and Cheeger's constant
Submodular functions, KL divergence and entropy
Gibbs kernel
Relationship between $GPG$ and $\overline{P}$ in KL divergence
Decay rate of the KL divergence of $(GPG)^l$ from $\Pi$ and the case of $k=2$ orbits
Decay rate of the KL divergence of $(GP)^l$ from $\Pi$ and $(PG)^l$
Optimisation in Frobenius norm
Case of $k=2$ orbits for $GP$
Case of $k=2$ for $GPG$
A recursive construction for minimising the Frobenius norm
Reducing the squared-Frobenius norm to $\mathcal{O}(k)$ via $GP$ and $GPG$ with $k$ orbits
Example: improving the lazy simple random walk on the $d$-dimensional hypercube via Cheeger's cut and $G_S P G_S$
...and 13 more sections

Key Result

Proposition 3.1

For any $P \in \mathcal{S}(\pi)$ and any partition $\mathcal{X} = \bigsqcup_{i=1}^k \mathcal{O}_i$, it holds that for $l \in \mathds{N}$ where $x \in \mathcal{O}_i$, $y \in \mathcal{O}_j.$

Figures (5)

Figure 1: Plot of worst-case TV distance for $G_S P G_S$ chosen amongst different criteria
Figure 2: Cut visualisation by magnetisation for the Curie--Weiss model with $d=4$.
Figure 3: Plot of worst-case TV distance for $G_S P$ chosen amongst different criteria
Figure 4: Cut visualisation by magnetisation for the Curie--Weiss model with $d=4$.
Figure 5: Comparison of cuts between true Frobenius objective via brute-force search and 1/2-approximate minimiser given in Proposition \ref{['prop:1/2-approx']} and \ref{['prop:1/2-approxGPG']}

Theorems & Definitions (70)

Proposition 3.1
proof
Proposition 3.2
proof
Proposition 3.3
proof
Corollary 3.4
Proposition 3.5
proof
Corollary 3.6
...and 60 more

Optimising two-block averaging kernels to speed up Markov chains

TL;DR

Abstract

Optimising two-block averaging kernels to speed up Markov chains

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (70)