A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning
Ingvar Ziemann
TL;DR
This paper analyzes the consistency and non-asymptotic recovery rates of the Gaussian maximum likelihood estimator (MLE) in linear autoregressive models using an information-theoretic approach. By leveraging Hellinger distance, mutual information, and the Donsker-Varadhan variational bound, the authors bound the estimation error of the MLE in terms of $I(\widehat{\mathsf{P}} \parallel Z_{1:n})$, obtaining the key inequality $\mathbb{E}\big[d_{\mathrm{H}}^2(\widehat{\mathsf{P}} \parallel \mathsf{P}_\star)\big] \le 2 I(\widehat{\mathsf{P}} \parallel Z_{1:n})$ and translating this into a finite-sample bound for linear AR parameters. The main result provides a bound $\mathbb{E} \mathrm{tr}\big((A_\star-\hat{A})^\mathsf{T}(A_\star-\hat{A}) \frac{1}{n}\sum_{i=1}^n \sum_{k=1}^{i} A_\star^{k-1} A_\star^{\mathsf{T},k-1}\big) \le (2\times 10^4)\frac{I(\hat{\mathsf{P}} \parallel Z_{1:n})}{n}$, and notes that for finite hypothesis classes $I(\hat{\mathsf{P}} \parallel Z_{1:n}) \le \log |\mathscr{P}|$. The analysis sidesteps lower-tail control, uses the Gaussian total variation structure, and suggests extensions to parametric classes via discretization when stability holds. Overall, the work provides non-asymptotic, dependence-aware guarantees for recovering linear dynamical system parameters from dependent data, without requiring stochastic stability assumptions in the finite-class setting.
Abstract
In this note, we give a short information-theoretic proof of the consistency of the Gaussian maximum likelihood estimator in linear auto-regressive models. Our proof yields nearly optimal non-asymptotic rates for parameter recovery and works without any invocation of stability in the case of finite hypothesis classes.
