Table of Contents
Fetching ...

Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains

Zheyuan Lai, Michael C. H. Choi

TL;DR

This paper addresses robust approximation of high-dimensional multivariate Markov chains on product spaces by factorized transitions, formulating a minimax objective to minimize the worst-case information loss measured by KL divergence. It shows that the problem can be transformed into a concave max over the simplex via strong duality and averaging/projection identities, and it casts the optimization as a two-player information-theoretic game with guaranteed existence of a mixed-strategy Nash equilibrium. The authors develop a projected subgradient method and a two-layer subgradient-greedy algorithm (max-min-max) leveraging orthant submodularity to compute near-optimal factorizations and weight mixtures, with provable guarantees. They demonstrate the practicality of the approach through numerical experiments on Curie-Weiss and Bernoulli-Laplace models, revealing sparse, interpretable optimal structures that balance model fidelity and tractability. The work contributes theoretical foundations and scalable algorithms for robustly aggregating and factorizing multivariate Markov dynamics, with potential applications in MCMC design and high-dimensional stochastic modeling.

Abstract

We study an information-theoretic minimax problem for finite multivariate Markov chains on $d$-dimensional product state spaces. Given a family $\mathcal B=\{P_1,\ldots,P_n\}$ of $π$-stationary transition matrices and a class $\mathcal F = \mathcal{F}(\mathbf{S})$ of factorizable models induced by a partition $\mathbf S$ of the coordinate set $[d]$, we seek to minimize the worst-case information loss by analyzing $$\min_{Q\in\mathcal F}\max_{P\in\mathcal B} D_{\mathrm{KL}}^π(P\|Q),$$ where $D_{\mathrm{KL}}^π(P\|Q)$ is the $π$-weighted KL divergence from $Q$ to $P$. We recast the above minimax problem into concave maximization over the $n$-probability-simplex via strong duality and Pythagorean identities that we derive. This leads us to formulate an information-theoretic game and show that a mixed strategy Nash equilibrium always exists; and propose a projected subgradient algorithm to approximately solve the minimax problem with provable guarantee. By transforming the minimax problem into an orthant submodular function in $\mathbf{S}$, this motivates us to consider a max-min-max submodular optimization problem and investigate a two-layer subgradient-greedy procedure to approximately solve this generalization. Numerical experiments for Markov chains on the Curie-Weiss and Bernoulli-Laplace models illustrate the practicality of these proposed algorithms and reveals sparse optimal structures in these examples.

Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains

TL;DR

This paper addresses robust approximation of high-dimensional multivariate Markov chains on product spaces by factorized transitions, formulating a minimax objective to minimize the worst-case information loss measured by KL divergence. It shows that the problem can be transformed into a concave max over the simplex via strong duality and averaging/projection identities, and it casts the optimization as a two-player information-theoretic game with guaranteed existence of a mixed-strategy Nash equilibrium. The authors develop a projected subgradient method and a two-layer subgradient-greedy algorithm (max-min-max) leveraging orthant submodularity to compute near-optimal factorizations and weight mixtures, with provable guarantees. They demonstrate the practicality of the approach through numerical experiments on Curie-Weiss and Bernoulli-Laplace models, revealing sparse, interpretable optimal structures that balance model fidelity and tractability. The work contributes theoretical foundations and scalable algorithms for robustly aggregating and factorizing multivariate Markov dynamics, with potential applications in MCMC design and high-dimensional stochastic modeling.

Abstract

We study an information-theoretic minimax problem for finite multivariate Markov chains on -dimensional product state spaces. Given a family of -stationary transition matrices and a class of factorizable models induced by a partition of the coordinate set , we seek to minimize the worst-case information loss by analyzing where is the -weighted KL divergence from to . We recast the above minimax problem into concave maximization over the -probability-simplex via strong duality and Pythagorean identities that we derive. This leads us to formulate an information-theoretic game and show that a mixed strategy Nash equilibrium always exists; and propose a projected subgradient algorithm to approximately solve the minimax problem with provable guarantee. By transforming the minimax problem into an orthant submodular function in , this motivates us to consider a max-min-max submodular optimization problem and investigate a two-layer subgradient-greedy procedure to approximately solve this generalization. Numerical experiments for Markov chains on the Curie-Weiss and Bernoulli-Laplace models illustrate the practicality of these proposed algorithms and reveals sparse optimal structures in these examples.

Paper Structure

This paper contains 25 sections, 11 theorems, 120 equations, 6 figures, 3 tables, 2 algorithms.

Key Result

Lemma 2.1

For given $\mathbf{w} \in \mathcal{S}_n$, $\pi \in \mathcal{P}(\mathcal{X})$, $P_i, Q \in \mathcal{L}(\mathcal{X})$ for $i \in \llbracket n \rrbracket$ where $P_i$ are all $\pi$-stationary, we choose mutually disjoint sets $S_1, \ldots, S_m$ with $\sqcup_{i=1}^m S_i = \llbracket d \rrbracket$, and t In particular, we have the following minimization result:

Figures (6)

  • Figure 1: Convergence of the projected subgradient algorithm for both models ($d=5$).
  • Figure 2: Trajectory plot of the projected subgradient algorithm for both models (incl. lazy chains).
  • Figure 3: Trajectory plots of the projected subgradient algorithm for both models (higher dimension).
  • Figure 4: Trajectory plot of Algorithm \ref{['alg:max_max']} for both models ($d=5$).
  • Figure 5: Trajectory plot of Algorithm \ref{['alg:max_max']} for both models (incl. lazy matrices).
  • ...and 1 more figures

Theorems & Definitions (23)

  • Lemma 2.1
  • proof
  • Corollary 2.2
  • Theorem 2.3: Submodularity of some information-theoretic functions in Markov chain theory
  • proof
  • Lemma 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem 4.1: Existence of mixed strategy Nash equilibrium
  • ...and 13 more