A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

Ingvar Ziemann

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

Ingvar Ziemann

TL;DR

This paper analyzes the consistency and non-asymptotic recovery rates of the Gaussian maximum likelihood estimator (MLE) in linear autoregressive models using an information-theoretic approach. By leveraging Hellinger distance, mutual information, and the Donsker-Varadhan variational bound, the authors bound the estimation error of the MLE in terms of $I(\widehat{\mathsf{P}} \parallel Z_{1:n})$, obtaining the key inequality $\mathbb{E}\big[d_{\mathrm{H}}^2(\widehat{\mathsf{P}} \parallel \mathsf{P}_\star)\big] \le 2 I(\widehat{\mathsf{P}} \parallel Z_{1:n})$ and translating this into a finite-sample bound for linear AR parameters. The main result provides a bound $\mathbb{E} \mathrm{tr}\big((A_\star-\hat{A})^\mathsf{T}(A_\star-\hat{A}) \frac{1}{n}\sum_{i=1}^n \sum_{k=1}^{i} A_\star^{k-1} A_\star^{\mathsf{T},k-1}\big) \le (2\times 10^4)\frac{I(\hat{\mathsf{P}} \parallel Z_{1:n})}{n}$, and notes that for finite hypothesis classes $I(\hat{\mathsf{P}} \parallel Z_{1:n}) \le \log |\mathscr{P}|$. The analysis sidesteps lower-tail control, uses the Gaussian total variation structure, and suggests extensions to parametric classes via discretization when stability holds. Overall, the work provides non-asymptotic, dependence-aware guarantees for recovering linear dynamical system parameters from dependent data, without requiring stochastic stability assumptions in the finite-class setting.

Abstract

In this note, we give a short information-theoretic proof of the consistency of the Gaussian maximum likelihood estimator in linear auto-regressive models. Our proof yields nearly optimal non-asymptotic rates for parameter recovery and works without any invocation of stability in the case of finite hypothesis classes.

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

TL;DR

, obtaining the key inequality

and translating this into a finite-sample bound for linear AR parameters. The main result provides a bound

, and notes that for finite hypothesis classes

. The analysis sidesteps lower-tail control, uses the Gaussian total variation structure, and suggests extensions to parametric classes via discretization when stability holds. Overall, the work provides non-asymptotic, dependence-aware guarantees for recovering linear dynamical system parameters from dependent data, without requiring stochastic stability assumptions in the finite-class setting.

Abstract

Paper Structure (4 sections, 4 theorems, 13 equations)

This paper contains 4 sections, 4 theorems, 13 equations.

Introduction
Information-Theoretic Preliminaries
Learning Generative Models in Hellinger Distance
Proof of the Main Result

Key Result

Theorem 1.1

Fix $W_{1:n} \sim N(0,I)$ and let $\mathsf{P}_{A_\star}$ be such that the $Z_{1:n}$ satisfy $Z_{k}=A_\star Z_{k-1}+W_k$ for $k =2,\dots n$ and $Z_1=W_1$. The maximum likelihood estimator $\widehat{A}$ (defined in sec:hellingerlearning) over any hypothesis class of the form $\mathscr{P}=\{\mathsf{P}_

Theorems & Definitions (6)

Theorem 1.1
Lemma 2.1: donsker1975asymptotic
Lemma 2.2
proof
Theorem 3.1
proof

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

TL;DR

Abstract

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (6)