Table of Contents
Fetching ...

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

Charis Stamouli, Ingvar Ziemann, George J. Pappas

TL;DR

The paper addresses non-asymptotic guarantees for the quadratic prediction error method in time-varying nonlinear predictor models, establishing rate-optimal bounds that match classical asymptotic rates up to constants. It develops a martingale-offset-based analysis to control the time-varying regression functions and delivers a leading $\mathcal{O}\left(\frac{d_\theta \sigma_w^2}{T}\right)$ error decay, with a burn-in time $T_0$ that depends polynomially on model parameters and sub-Gaussian noise, and logarithmic factors that vanish for large $T$. The authors apply the results to a class of identifiable ARMA models, yielding the first non-asymptotic, rate-optimal identification guarantees in this nonlinear setting. Overall, the work provides finite-sample performance guarantees for nonlinear time-series prediction and identification, extending non-asymptotic analysis beyond linear models and enabling ARMA identification with sharp rates.

Abstract

We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear, model classes. By leveraging modern tools from learning with dependent data, we provide the first rate-optimal non-asymptotic analysis of this method for our more general setting of nonlinearly parametrized model classes. Moreover, we show that our results can be applied to a particular class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in the first optimal non-asymptotic rates for identification of ARMA models.

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

TL;DR

The paper addresses non-asymptotic guarantees for the quadratic prediction error method in time-varying nonlinear predictor models, establishing rate-optimal bounds that match classical asymptotic rates up to constants. It develops a martingale-offset-based analysis to control the time-varying regression functions and delivers a leading error decay, with a burn-in time that depends polynomially on model parameters and sub-Gaussian noise, and logarithmic factors that vanish for large . The authors apply the results to a class of identifiable ARMA models, yielding the first non-asymptotic, rate-optimal identification guarantees in this nonlinear setting. Overall, the work provides finite-sample performance guarantees for nonlinear time-series prediction and identification, extending non-asymptotic analysis beyond linear models and enabling ARMA identification with sharp rates.

Abstract

We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear, model classes. By leveraging modern tools from learning with dependent data, we provide the first rate-optimal non-asymptotic analysis of this method for our more general setting of nonlinearly parametrized model classes. Moreover, we show that our results can be applied to a particular class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in the first optimal non-asymptotic rates for identification of ARMA models.
Paper Structure (10 sections, 16 theorems, 191 equations)

This paper contains 10 sections, 16 theorems, 191 equations.

Key Result

Theorem 1

Given data from a sufficiently stable system, for a wide range of identifiable models $f_t(\cdot,\theta_{\star})$, the mean-squared prediction error corresponding to any least-squares estimate $\widehat{\theta}\in\mathop{\mathrm{arg\,min}}\limits_{\theta\in\mathsf{M}}L_T(\theta)$ satisfies:

Theorems & Definitions (28)

  • Theorem : Informal Version of Theorem \ref{['theorem:main_result']}
  • Definition 2.1: Dependency Matrix
  • Remark 1
  • Theorem 1: Optimal Non-asymptotic Rates for the Quadratic Prediction Error Method
  • Remark 2: Result interpretation
  • Theorem 2
  • Lemma 4.1
  • Theorem 3
  • Corollary 1
  • Corollary 2
  • ...and 18 more