Pathwise guessing in categorical time series with unbounded alphabets

J. -R. Chazottes; S. Gallo; D. Takahashi

Pathwise guessing in categorical time series with unbounded alphabets

J. -R. Chazottes, S. Gallo, D. Takahashi

TL;DR

This paper develops a non-parametric probabilistic guessing framework for categorical time series with potentially unbounded alphabets and long memory. It introduces a data-driven estimator that maximizes conditioned empirical frequencies and provides risk bounds that are independent of the alphabet size, under a general dependence condition captured by $\\Gamma(p)$ and a margin parameter $\\delta_{D,G}$. The authors prove both upper and near-optimal minimax lower bounds, with explicit rates that depend on the margin regime, and show the framework applies to a broad set of models including Markov chains, autoregressive models, Poisson regression, hidden Markov chains, mixtures, and Gibbs measures. The results leverage a DK-W type inequality for dependent sequences and establish exponential convergence in favorable margin regimes, highlighting practical predictability improvements when the alphabet is large or unbounded. Overall, the work provides a principled, non-parametric approach to guessing in complex time-series settings where traditional conditional-probability estimation would be impractical.

Abstract

The following learning problem arises naturally in various applications: Given a finite sample from a categorical or count time series, can we learn a function of the sample that (nearly) maximizes the probability of correctly guessing the values of a given portion of the data using the values from the remaining parts? Unlike classical approaches in statistical inference, our approach avoids explicitly estimating the conditional probabilities. We propose a non-parametric guessing function with a learning rate independent of the alphabet size. Our analysis focuses on a broad class of time series models that encompasses finite-order Markov chains, some hidden Markov chains, Poisson regression for count processes, and one-dimensional Gibbs measures. We provide a margin condition that controls the rate of convergence for the risk. Additionally, we establish a minimax lower bound for the convergence rate of the risk associated with our guessing problem. This lower bound matches the upper bound achieved by our estimator up to a logarithmic factor, demonstrating its near-optimality.

Pathwise guessing in categorical time series with unbounded alphabets

TL;DR

and a margin parameter

. The authors prove both upper and near-optimal minimax lower bounds, with explicit rates that depend on the margin regime, and show the framework applies to a broad set of models including Markov chains, autoregressive models, Poisson regression, hidden Markov chains, mixtures, and Gibbs measures. The results leverage a DK-W type inequality for dependent sequences and establish exponential convergence in favorable margin regimes, highlighting practical predictability improvements when the alphabet is large or unbounded. Overall, the work provides a principled, non-parametric approach to guessing in complex time-series settings where traditional conditional-probability estimation would be impractical.

Abstract

Paper Structure (22 sections, 8 theorems, 77 equations)

This paper contains 22 sections, 8 theorems, 77 equations.

Introduction
The probabilistic guessing problem
Notation.
The probabilistic guessing problem.
The estimator.
Assumption and statement of the results
Examples satisfying Assumption \ref{['eq:Gammadef']}
Independent random variables.
Markov chains.
Autoregressive models.
Poisson regression for count time series.
Hidden Markov chains.
Convex mixture of Markov chains.
Gibbs measures.
Proofs
...and 7 more sections

Key Result

Proposition 3.1

If the left conditional probability $p$ of a process with distribution $\mathds{P}$ is such that then

Theorems & Definitions (9)

Definition 2.1
Proposition 3.1
Theorem 3.1
Corollary 3.1
Theorem 3.2
Proposition 4.1
Theorem A.1
Theorem B.1
Lemma C.1

Pathwise guessing in categorical time series with unbounded alphabets

TL;DR

Abstract

Pathwise guessing in categorical time series with unbounded alphabets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (9)