Table of Contents
Fetching ...

Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models

Xiaoyuan Ma, Jordan Rodu

TL;DR

This paper provides an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, proposes a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and develops online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity.

Abstract

The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMM), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM has been elusive, and the long-run performance of SHMM can degrade due to unchecked propagation of error. In this paper, we (1) provide an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, (2) propose a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and (3) develop online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity. We compare the performance of SHMM with PSHMM and estimation through the B-W algorithm on both simulated data and data from real world applications, and find that PSHMM not only retains the computational advantages of SHMM, but also provides more robust estimation and forecasting.

Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models

TL;DR

This paper provides an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, proposes a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and develops online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity.

Abstract

The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMM), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM has been elusive, and the long-run performance of SHMM can degrade due to unchecked propagation of error. In this paper, we (1) provide an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, (2) propose a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and (3) develop online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity. We compare the performance of SHMM with PSHMM and estimation through the B-W algorithm on both simulated data and data from real world applications, and find that PSHMM not only retains the computational advantages of SHMM, but also provides more robust estimation and forecasting.
Paper Structure (39 sections, 2 theorems, 39 equations, 9 figures, 3 tables, 3 algorithms)

This paper contains 39 sections, 2 theorems, 39 equations, 9 figures, 3 tables, 3 algorithms.

Key Result

Lemma 1

where

Figures (9)

  • Figure 1: Model structure of standard HMM. $\{ h_t \}$ is a latent Markov chain that evolves according to transition matrix $\mathbf{T}$. For each time stamp $t$, the observed $X_t$ is generated according to the emission distribution associated with $h_t$.
  • Figure 2: Spectral estimation model by rodu2014spectral. In addition to the latent state series $\{ h_t \}_t$ and observed series $\{ X_t \}_t$, rodu2014spectral introduced a reduced-dimensional series $\{ Y_t = U^\top X_t \}$ which is a projection of $X_t$ on a lower-dimensional subspace whose dimensionality is equal to the number of hidden states. Spectral estimation proceeds based on $\{ Y_t \}_t$.
  • Figure 3: Convergence of SHMM estimation error to theoretical distribution. Empirical histograms of $\hat{P}r (x_{1:T}) - Pr(x_{1:T})$ are shown for different training sizes $N$ and sequence lengths $T$, alongside the theoretical density derived from Theorem \ref{['theorem: CLT']}. Each subfigure corresponds to a different $N$. (1) As $N$ increases, the empirical distribution converges to the theoretical normal distribution. (2) Convergence is faster for smaller values of $T$.
  • Figure 4: Histogram of the first-order error from the first, second and third moment estimation error (i.e. $(v+ \tilde{v})^\top \widehat{\Delta \mu}$, $\sum_{t=0}^T b_t^\top \widehat{\Delta \Sigma} \tilde{b_t}$, and $\sum_{t=1}^T a_t^\top \widehat{\Delta K}(y_t) \tilde{a_t}$) under different training sizes $N$ with fixed length $T=30$ vs. the theoretical pdf calculated based on Theorem \ref{['theorem: CLT']} (red line). Each subfigure is associated with a different $N$. As $N$ increases, the distribution converges to the theoretical normal distribution.
  • Figure 5: Histogram of the Frobenius norm of the first, second and third moment estimation error (i.e. $\mu$, $\Sigma$ and $K$) under different training size $N$ vs. the theoretical pdf (red line). Here the red line is the theoretical Chi-squared distribution. Each subfigure is associated with a different $N$. As $N$ increases, the distribution converges to the theoretical distribution.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Theorem 1
  • Proof : Theorem \ref{['theorem: CLT']}
  • Proof : Lemma 1