Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

Yanjun Han; Tianze Jiang; Yihong Wu

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

Yanjun Han, Tianze Jiang, Yihong Wu

TL;DR

This work develops a universal-compression framework to predict the next symbol in sequences generated by processes with long memory, notably Hidden Markov Models and renewal processes. By decomposing the minimax prediction risk into a redundancy term and a memory term, the authors derive tight upper and matching lower bounds, showing that for bounded-state HMMs the optimal KL prediction risk scales as $\Theta\big(\frac{k\ell}{n}\log\frac{n}{k\ell} + \frac{k^2}{n}\log\frac{n}{k^2}\big)$. They provide a polynomial-time estimator achieving the optimal rate when $k$ and $\ell$ are constant, and extend the analysis to Gaussian emissions via a general corollary; for renewal processes the rate is $\Theta(n^{-1/2})$, with non-efficient optimal predictors. The results unify prediction and universal compression, yield practical DP-based algorithms for HMM prediction, and illuminate fundamental trade-offs between memory, redundancy, and computation in sequential prediction problems.

Abstract

Consider the problem of predicting the next symbol given a sample path of length n, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback- Leibler divergence up to universal constant factors. Extending existing results in finite-order Markov models [HJW23] and drawing ideas from universal compression, the proposed estimator has a prediction risk bounded by redundancy of the distribution class and a memory term that accounts for the long-range dependency of the model. Notably, for HMMs with bounded state and observation spaces, a polynomial-time estimator based on dynamic programming is shown to achieve the optimal prediction risk Θ(log n/n); prior to this work, the only known result of this type is O(1/log n) obtained using Markov approximation [Sha+18]. Matching minimax lower bounds are obtained by making connections to redundancy and mutual information via a reduction argument.

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

TL;DR

. They provide a polynomial-time estimator achieving the optimal rate when

and

are constant, and extend the analysis to Gaussian emissions via a general corollary; for renewal processes the rate is

, with non-efficient optimal predictors. The results unify prediction and universal compression, yield practical DP-based algorithms for HMM prediction, and illuminate fundamental trade-offs between memory, redundancy, and computation in sequential prediction problems.

Abstract

Paper Structure (39 sections, 29 theorems, 118 equations, 1 figure)

This paper contains 39 sections, 29 theorems, 118 equations, 1 figure.

Introduction
Main results
Hidden Markov Models
Renewal processes
Related works
Prediction risk bound based on universal compression
Proof of the upper bounds
Bounding the memory term for HMMs
Redundancy bound for HMM
An optimal prediction algorithm
Renewal processes
Proof of the lower bounds
Reduction from redundancy to prediction risk
Lower bounding the redundancy of HMM
Large $\ell$.
...and 24 more sections

Key Result

Theorem 1

The following holds:

Figures (1)

Figure 1: Algorithm for computing ${\mathcal{A}}(M,T; x^{K})$, the number of satisfying hidden state sequences.

Theorems & Definitions (45)

Theorem 1: Optimal prediction risk for HMM
Theorem 2: Computationally efficient algorithms
Theorem 3: Informal: Computational lower bounds
Theorem 4: Prediction of renewal processes
Proposition 1: Upper bound prediction risk by redundancy
Proposition 2
Proposition 3
Corollary 1
Remark 1
Lemma 1
...and 35 more

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

TL;DR

Abstract

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (45)