Table of Contents
Fetching ...

Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization

Zheyuan Lai, Michael C. H. Choi

TL;DR

This work studies how to select coordinate subsets or partitions of a finite multivariate Markov chain to minimize information loss when moving to a lower-dimensional description. By revealing submodular and supermodular structures in entropy rate, distance to factorizability, distance to independence, and distance to stationarity, the authors develop greedy and generalized distorted-greedy algorithms with performance guarantees for k-submodular settings. They introduce a generalized distorted greedy framework and apply it to several information-theoretic objectives, providing theoretical bounds and practical algorithms. Numerical experiments on Curie–Weiss and Bernoulli–Laplace models demonstrate the effectiveness of subset/partition selection for improving sampling, model reduction, and understanding of coordinate interactions in multivariate Markov chains.

Abstract

We study the problem of optimally projecting the transition matrix of a finite ergodic multivariate Markov chain onto a lower-dimensional state space, as well as the problem of finding an optimal partition of coordinates such that the factorized Markov chain gives minimal information loss compared to the original multivariate chain. Specifically, we seek to construct a Markov chain that optimizes various information-theoretic criteria under cardinality constraints. These criteria include entropy rate, information-theoretic distance to factorizability, independence, and stationarity. We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest. Finally, we illustrate the theory and algorithms through extensive numerical experiments with publicly available code on multivariate Markov chains associated with the Bernoulli--Laplace and Curie--Weiss models.

Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization

TL;DR

This work studies how to select coordinate subsets or partitions of a finite multivariate Markov chain to minimize information loss when moving to a lower-dimensional description. By revealing submodular and supermodular structures in entropy rate, distance to factorizability, distance to independence, and distance to stationarity, the authors develop greedy and generalized distorted-greedy algorithms with performance guarantees for k-submodular settings. They introduce a generalized distorted greedy framework and apply it to several information-theoretic objectives, providing theoretical bounds and practical algorithms. Numerical experiments on Curie–Weiss and Bernoulli–Laplace models demonstrate the effectiveness of subset/partition selection for improving sampling, model reduction, and understanding of coordinate interactions in multivariate Markov chains.

Abstract

We study the problem of optimally projecting the transition matrix of a finite ergodic multivariate Markov chain onto a lower-dimensional state space, as well as the problem of finding an optimal partition of coordinates such that the factorized Markov chain gives minimal information loss compared to the original multivariate chain. Specifically, we seek to construct a Markov chain that optimizes various information-theoretic criteria under cardinality constraints. These criteria include entropy rate, information-theoretic distance to factorizability, independence, and stationarity. We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest. Finally, we illustrate the theory and algorithms through extensive numerical experiments with publicly available code on multivariate Markov chains associated with the Bernoulli--Laplace and Curie--Weiss models.

Paper Structure

This paper contains 33 sections, 39 theorems, 165 equations, 12 figures, 19 tables, 4 algorithms.

Key Result

Theorem 2.1

Let $\pi \in \mathcal{P}(\mathcal{X})$, $P, L \in \mathcal{L}(\mathcal{X})$ and suppose $S \subseteq \llbracket d \rrbracket$, we have:

Figures (12)

  • Figure 1: The leave-one-out mixing analysis of the Curie-Weiss model ($d=8$).
  • Figure 2: Empirical CDF comparison of MCMC simulation. Here, "Tensor samples" refer to the 1000 samples generated by the transition matrix $(P^{(-4)})^{10} \otimes (P^{(4)})^{10}$.
  • Figure 3: Entropy rate against subset size for the three algorithms (B--L model).
  • Figure 4: Entropy rate against subset size for the three algorithms (C--W model).
  • Figure 5: Distance to factorizability against subset size for the three algorithms (C--W model).
  • ...and 7 more figures

Theorems & Definitions (58)

  • Theorem 2.1: Partition lemma (Theorem 2.4 of choi2024ratedistortionframeworkmcmcalgorithms)
  • Lemma 2.2
  • proof
  • Theorem 2.3: Submodularity of some functions in Markov chain theory (Proposition 2.6 of choi2024ratedistortionframeworkmcmcalgorithms)
  • Theorem 2.4: Supermodularity and monotonicity of the distance to independence of $P^{(-S)}$
  • proof
  • Theorem 2.5: Transform a non-monotone submodular $f$ to a monotone submodular $g$ (Proposition 14.18 of korte2011combinatorial)
  • Example 2.6: Sensor placement problem (Section 5.2 of NIPS2015_f770b62b)
  • Theorem 2.7: Characterization of $k$-submodularity (Theorem $7$ of MR3549595)
  • Lemma 2.8
  • ...and 48 more