Table of Contents
Fetching ...

Near-Optimal Clustering in Mixture of Markov Chains

Junghyun Lee, Yassir Jedra, Alexandre Proutière, Se-Young Yun

Abstract

We study the problem of clustering $T$ trajectories of length $H$, each generated by one of K unknown ergodic Markov chains over a finite state space of size $S$. We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the stationary-weighted KL divergence between transition kernels. We then propose a two-stage algorithm: Stage I applies spectral clustering via a new injective Euclidean embedding for ergodic Markov chains, a contribution of independent interest enabling sharp concentration results; Stage II refines clusters with a single likelihood-based reassignment step. We prove that our algorithm achieves near-optimal clustering error with high probability under reasonable requirements on $T$ and $H$. Preliminary experiments support our approach, and we conclude with discussions of its limitations and extensions.

Near-Optimal Clustering in Mixture of Markov Chains

Abstract

We study the problem of clustering trajectories of length , each generated by one of K unknown ergodic Markov chains over a finite state space of size . We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the stationary-weighted KL divergence between transition kernels. We then propose a two-stage algorithm: Stage I applies spectral clustering via a new injective Euclidean embedding for ergodic Markov chains, a contribution of independent interest enabling sharp concentration results; Stage II refines clusters with a single likelihood-based reassignment step. We prove that our algorithm achieves near-optimal clustering error with high probability under reasonable requirements on and . Preliminary experiments support our approach, and we conclude with discussions of its limitations and extensions.

Paper Structure

This paper contains 74 sections, 22 theorems, 127 equations, 6 figures.

Key Result

theorem 1

Error Rate Lower Boundlower-bound Let $(\varepsilon, \delta) \in [0, 1] \times (0, 1/2]$. Then, a necessary condition for the existence of a $(\varepsilon, \beta, \delta)$-locally stable algorithm at $\Phi_T := (({\mathcal{M}}^{(k)})_{k \in [K]}, f, T)$ with $\beta \geq 2\sqrt{2} \varepsilon$ is as For $(\varepsilon, \beta)$-asymptotically locally stable algorithm with the same $\beta$ as above a

Figures (6)

  • Figure 1: Clustering error rates of our and kausik2023mixture's algorithms on Cyclic-Bump MMC.
  • Figure 2: (Ablation #1) Clustering error on the synthetic MMC instance across EM iterations (up to 10). Panels vary across $T \in \{100, 200, \cdots, 600\}$; the x-axis varies across $H \in \{10, 20, \cdots, 50\}$. Darker red curves indicate more EM iterations; "Oracle" and "Stage I" are shown for reference.
  • Figure 3: (Ablation #2) Impact of $S$ and $T$ on the resulting clustering error and runtime.
  • Figure 4: (Ablation #3) Clustering error of the unknown-$K$ variant on the synthetic MMC instance across different $(c_1, c_2)$ settings. Panels vary across $c_1 \in \{10^{-3}, 10^{-2}, 10^{-1}\}$ (columns) and trajectory length $H$ (rows: top for $H \leq 1000$, bottom for $H \geq 1000$). Curves indicate different $c_2$ values; Stage I and Stage I+II are shown separately.
  • Figure 5: (Ablation #4) Clustering error on another synthetic MMC instances with varying $\gamma_\mathrm{ps}$'s, over $H \in \{100, 200, \cdots, 2000\}$.
  • ...and 1 more figures

Theorems & Definitions (43)

  • definition 1
  • theorem 1
  • proof : Proof sketch
  • Remark 1: Chernoff Information
  • Remark 2: Comparison to wang2023divergence
  • definition 2
  • Remark 3
  • Proposition 4.1
  • proof
  • theorem 2
  • ...and 33 more