Table of Contents
Fetching ...

Universal Sequence Preconditioning

Annie Marsden, Elad Hazan

TL;DR

Universal Sequence Preconditioning develops a framework for preconditioning target sequences in online sequential prediction by convolving with fixed polynomial coefficients, effectively applying a polynomial $p_n^{\mathbf{c}}(A)$ to the hidden transition matrix $A$ in linear dynamical systems. The authors propose using orthogonal polynomials, notably monic Chebyshev (and Legendre) coefficients, to achieve spectrum-shrinking effects that improve learnability across predictors, including regression and spectral filtering, and extend to asymmetric, marginally stable LDS with near-dimension-free regret up to polylog factors. They prove sublinear, dimension-independent regret bounds under spectral constraints and provide an extension of Chebyshev bounds to the complex plane, enabling universal guarantees. Empirical results on synthetic data (linear/nonlinear LDS and deep RNNs) and real-world ETTh1 time-series demonstrate robust performance gains and broad applicability, suggesting USP as a principled initialization or plug-in improvement for diverse sequence-prediction models.

Abstract

We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.

Universal Sequence Preconditioning

TL;DR

Universal Sequence Preconditioning develops a framework for preconditioning target sequences in online sequential prediction by convolving with fixed polynomial coefficients, effectively applying a polynomial to the hidden transition matrix in linear dynamical systems. The authors propose using orthogonal polynomials, notably monic Chebyshev (and Legendre) coefficients, to achieve spectrum-shrinking effects that improve learnability across predictors, including regression and spectral filtering, and extend to asymmetric, marginally stable LDS with near-dimension-free regret up to polylog factors. They prove sublinear, dimension-independent regret bounds under spectral constraints and provide an extension of Chebyshev bounds to the complex plane, enabling universal guarantees. Empirical results on synthetic data (linear/nonlinear LDS and deep RNNs) and real-world ETTh1 time-series demonstrate robust performance gains and broad applicability, suggesting USP as a principled initialization or plug-in improvement for diverse sequence-prediction models.

Abstract

We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.

Paper Structure

This paper contains 40 sections, 14 theorems, 179 equations, 5 figures, 4 tables, 4 algorithms.

Key Result

Theorem 2.1

Let $\left \{ \mathbf u_t \right \}_{t = 1}^T \in \mathbb{C}^{d_{\textrm{in}}}$ be any sequence of inputs which satisfy $\|\mathbf u_t\|_2 ~\le~ 1$ and let $\left \{ \mathbf{y}_t \right \}_{t = 1}^T \in \mathbb{C}^{d_{\text{out}}}$ be the corresponding output coming from some linear dynamical system then the predictions $\hat{\mathbf{y}}_1, \dots, \hat{\mathbf{y}}_T$ from Algorithm alg:preconditio

Figures (5)

  • Figure 1: Absolute prediction error on final $200$ predictions averaged over 10 independent runs for 10-layer LSTM with layer dimension $100$ using Adam optimizer and sweeping over learning rates for each run.
  • Figure 2: The origin minimizes the intersection area up to factor $\frac{1}{4}$.
  • Figure 3: Absolute prediction error averaged over 200 independent runs with data generated from a linear dynamical system with varying complex threshold.
  • Figure 4: Absolute prediction error averaged over 200 independent runs with data generated from a nonlinear dynamical system with varying complex threshold.
  • Figure 5: Absolute prediction error of a 10-layer DNN model averaged over 200 independent runs.

Theorems & Definitions (27)

  • Theorem 2.1
  • Theorem 2.2
  • Lemma 3.1
  • Lemma 3.2
  • Theorem D.1: General Form of Theorem \ref{['thm:convex_relaxation']}
  • proof : Proof of Theorem \ref{['thm:convex_relaxation_general']}
  • Theorem : Restatement of Theorem \ref{['thm:convex_relaxation']}
  • proof
  • Theorem E.1
  • Theorem : Detailed Version of Theorem \ref{['thm:main_regret']}
  • ...and 17 more