Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

Charis Stamouli; Ingvar Ziemann; George J. Pappas

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

Charis Stamouli, Ingvar Ziemann, George J. Pappas

TL;DR

The paper addresses non-asymptotic guarantees for the quadratic prediction error method in time-varying nonlinear predictor models, establishing rate-optimal bounds that match classical asymptotic rates up to constants. It develops a martingale-offset-based analysis to control the time-varying regression functions and delivers a leading $\mathcal{O}\left(\frac{d_\theta \sigma_w^2}{T}\right)$ error decay, with a burn-in time $T_0$ that depends polynomially on model parameters and sub-Gaussian noise, and logarithmic factors that vanish for large $T$. The authors apply the results to a class of identifiable ARMA models, yielding the first non-asymptotic, rate-optimal identification guarantees in this nonlinear setting. Overall, the work provides finite-sample performance guarantees for nonlinear time-series prediction and identification, extending non-asymptotic analysis beyond linear models and enabling ARMA identification with sharp rates.

Abstract

We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear, model classes. By leveraging modern tools from learning with dependent data, we provide the first rate-optimal non-asymptotic analysis of this method for our more general setting of nonlinearly parametrized model classes. Moreover, we show that our results can be applied to a particular class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in the first optimal non-asymptotic rates for identification of ARMA models.

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

TL;DR

error decay, with a burn-in time

that depends polynomially on model parameters and sub-Gaussian noise, and logarithmic factors that vanish for large

. The authors apply the results to a class of identifiable ARMA models, yielding the first non-asymptotic, rate-optimal identification guarantees in this nonlinear setting. Overall, the work provides finite-sample performance guarantees for nonlinear time-series prediction and identification, extending non-asymptotic analysis beyond linear models and enabling ARMA identification with sharp rates.

Abstract

Paper Structure (10 sections, 16 theorems, 191 equations)

This paper contains 10 sections, 16 theorems, 191 equations.

Introduction
Problem Formulation
Optimal Non-asymptotic Rates for the Quadratic Prediction Error Method
Proof Sketch of Theorem \ref{['theorem:main_result']}
Case Study: The ARMA Model
Basic Definitions and Results
Proof of Theorem \ref{['theorem:theorem2']}
Proof of Lemma \ref{['lemma:taylor_bound']}
Proof of Theorem \ref{['theorem:expected_self_normalized_martingale']}
Proof of Corollary \ref{['corollary:expected_M__hattheta_bound']}

Key Result

Theorem 1

Given data from a sufficiently stable system, for a wide range of identifiable models $f_t(\cdot,\theta_{\star})$, the mean-squared prediction error corresponding to any least-squares estimate $\widehat{\theta}\in\mathop{\mathrm{arg\,min}}\limits_{\theta\in\mathsf{M}}L_T(\theta)$ satisfies:

Theorems & Definitions (28)

Theorem : Informal Version of Theorem \ref{['theorem:main_result']}
Definition 2.1: Dependency Matrix
Remark 1
Theorem 1: Optimal Non-asymptotic Rates for the Quadratic Prediction Error Method
Remark 2: Result interpretation
Theorem 2
Lemma 4.1
Theorem 3
Corollary 1
Corollary 2
...and 18 more

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

TL;DR

Abstract

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (28)