Table of Contents
Fetching ...

Learning Hidden Markov Models Using Conditional Samples

Sham M. Kakade, Akshay Krishnamurthy, Gaurav Mahajan, Cyril Zhang

TL;DR

The paper tackles the cryptographic hardness of learning Hidden Markov Models by introducing interactive conditional sampling. It develops two main results: an exact conditional-probability setting that generalizes Angluin's L* to efficiently learn any rank-$r$ HMM in poly$(r,T,1/\varepsilon,\log(1/\delta))$ time, and a conditional-sampling setting where learnability is guaranteed for high-fidelity distributions with complexity poly$(r,T,O,1/\Delta^*,1/\varepsilon,\log(1/\delta))$. The core ideas are an observable-operator representation with a basis-based efficient encoding, a circulant-structure propagation of coefficients, and a novel perturbation analysis that controls error amplification across long sequences; the fidelity parameter $\Delta^*$ plays a central role in determining tractability. These results extend to broader latent low-rank distributions and connect interactive learning with DFA-learning through a robust L*-style framework, suggesting practical avenues for interactive training of sequential models while highlighting open questions about removing fidelity assumptions.

Abstract

This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM). Although HMMs are some of the most widely used tools in sequential and time series modeling, they are cryptographically hard to learn in the standard setting where one has access to i.i.d. samples of observation sequences. In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs. We show that interactive access to the HMM enables computationally efficient learning algorithms, thereby bypassing cryptographic hardness. Specifically, we obtain efficient algorithms for learning HMMs in two settings: (a) An easier setting where we have query access to the exact conditional probabilities. Here our algorithm runs in polynomial time and makes polynomially many queries to approximate any HMM in total variation distance. (b) A harder setting where we can only obtain samples from the conditional distributions. Here the performance of the algorithm depends on a new parameter, called the fidelity of the HMM. We show that this captures cryptographically hard instances and previously known positive results. We also show that these results extend to a broader class of distributions with latent low rank structure. Our algorithms can be viewed as generalizations and robustifications of Angluin's $L^*$ algorithm for learning deterministic finite automata from membership queries.

Learning Hidden Markov Models Using Conditional Samples

TL;DR

The paper tackles the cryptographic hardness of learning Hidden Markov Models by introducing interactive conditional sampling. It develops two main results: an exact conditional-probability setting that generalizes Angluin's L* to efficiently learn any rank- HMM in poly time, and a conditional-sampling setting where learnability is guaranteed for high-fidelity distributions with complexity poly. The core ideas are an observable-operator representation with a basis-based efficient encoding, a circulant-structure propagation of coefficients, and a novel perturbation analysis that controls error amplification across long sequences; the fidelity parameter plays a central role in determining tractability. These results extend to broader latent low-rank distributions and connect interactive learning with DFA-learning through a robust L*-style framework, suggesting practical avenues for interactive training of sequential models while highlighting open questions about removing fidelity assumptions.

Abstract

This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM). Although HMMs are some of the most widely used tools in sequential and time series modeling, they are cryptographically hard to learn in the standard setting where one has access to i.i.d. samples of observation sequences. In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs. We show that interactive access to the HMM enables computationally efficient learning algorithms, thereby bypassing cryptographic hardness. Specifically, we obtain efficient algorithms for learning HMMs in two settings: (a) An easier setting where we have query access to the exact conditional probabilities. Here our algorithm runs in polynomial time and makes polynomially many queries to approximate any HMM in total variation distance. (b) A harder setting where we can only obtain samples from the conditional distributions. Here the performance of the algorithm depends on a new parameter, called the fidelity of the HMM. We show that this captures cryptographically hard instances and previously known positive results. We also show that these results extend to a broader class of distributions with latent low rank structure. Our algorithms can be viewed as generalizations and robustifications of Angluin's algorithm for learning deterministic finite automata from membership queries.
Paper Structure (39 sections, 41 theorems, 219 equations, 2 figures, 3 algorithms)

This paper contains 39 sections, 41 theorems, 219 equations, 2 figures, 3 algorithms.

Key Result

Theorem 1

Assume $\mathcal{O} = \{0,1\}$. Let $\Pr[\cdot]$ be any rank $r$ distribution over observation sequences of length $T$. Pick any $0 < \varepsilon, \delta < 1$. Then alg:exact with access to an exact probability oracle and samples from $\Pr[\cdot]$, runs in $\mathop{\mathrm{poly}}\nolimits(r,T,1/\var

Figures (2)

  • Figure 1: Schematic of the circulant structure relating the $\Pr[F_t | H_t]$ and $\Pr[F_{t+1} | H_{t+1}]$ matrices. Columns of $\Pr[F_t \mid H_t]$ can be represented linearly in basis $B_t$ using coefficients $\beta(\cdot)$. The blocks $\Pr[oF_{t+1} \mid B_t]$ appear in the next matrix $\Pr[F_{t+1} \mid H_{t+1}]$ (up to scaling), so they can be represented in basis $B_{t+1}$, yielding operators $A_{o,t}$.
  • Figure 2: Hidden Markov model for noisy parity. Each hidden state is of the form $(z_t, b_t, t)$ where $z_t$ represents the current bit to be output, $b_t$ is the parity of a secret subset of previous bits and $t$ is the bit position. $b_1$ is always set to $0$ and $z_T$ is set to $b_T$ with probability $\alpha$ and $1 - b_T$ otherwise for some $\alpha \in (0,1/2)$. For other positions $t\in [T-1]$, transition from hidden state $(z_t, b_t, t)$ goes uniformly randomly to hidden states $(1,b_{t+1}, t+1)$ and $(0,b_{t+1}, t+1)$, where $b_{t+1} = b_{t} \oplus z_t$ if $t \in I$ and $b_{t+1} = b_t$ otherwise.

Theorems & Definitions (61)

  • Definition 1.1: Hidden Markov Models
  • Definition 1.2: Rank of a distribution
  • Definition 1.2: Exact conditional probability oracle
  • Definition 1.2: Conditional sampling oracle
  • Remark 1.3
  • Theorem 1: Learning with exact conditional probabilities
  • Theorem 2: Learning with conditional samples
  • Remark 2.1
  • Remark 2.2
  • Definition 2.2: Fidelity
  • ...and 51 more