Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains
Zheyuan Lai, Michael C. H. Choi
TL;DR
This paper addresses robust approximation of high-dimensional multivariate Markov chains on product spaces by factorized transitions, formulating a minimax objective to minimize the worst-case information loss measured by KL divergence. It shows that the problem can be transformed into a concave max over the simplex via strong duality and averaging/projection identities, and it casts the optimization as a two-player information-theoretic game with guaranteed existence of a mixed-strategy Nash equilibrium. The authors develop a projected subgradient method and a two-layer subgradient-greedy algorithm (max-min-max) leveraging orthant submodularity to compute near-optimal factorizations and weight mixtures, with provable guarantees. They demonstrate the practicality of the approach through numerical experiments on Curie-Weiss and Bernoulli-Laplace models, revealing sparse, interpretable optimal structures that balance model fidelity and tractability. The work contributes theoretical foundations and scalable algorithms for robustly aggregating and factorizing multivariate Markov dynamics, with potential applications in MCMC design and high-dimensional stochastic modeling.
Abstract
We study an information-theoretic minimax problem for finite multivariate Markov chains on $d$-dimensional product state spaces. Given a family $\mathcal B=\{P_1,\ldots,P_n\}$ of $π$-stationary transition matrices and a class $\mathcal F = \mathcal{F}(\mathbf{S})$ of factorizable models induced by a partition $\mathbf S$ of the coordinate set $[d]$, we seek to minimize the worst-case information loss by analyzing $$\min_{Q\in\mathcal F}\max_{P\in\mathcal B} D_{\mathrm{KL}}^π(P\|Q),$$ where $D_{\mathrm{KL}}^π(P\|Q)$ is the $π$-weighted KL divergence from $Q$ to $P$. We recast the above minimax problem into concave maximization over the $n$-probability-simplex via strong duality and Pythagorean identities that we derive. This leads us to formulate an information-theoretic game and show that a mixed strategy Nash equilibrium always exists; and propose a projected subgradient algorithm to approximately solve the minimax problem with provable guarantee. By transforming the minimax problem into an orthant submodular function in $\mathbf{S}$, this motivates us to consider a max-min-max submodular optimization problem and investigate a two-layer subgradient-greedy procedure to approximately solve this generalization. Numerical experiments for Markov chains on the Curie-Weiss and Bernoulli-Laplace models illustrate the practicality of these proposed algorithms and reveals sparse optimal structures in these examples.
