Universal Sequence Preconditioning
Annie Marsden, Elad Hazan
TL;DR
Universal Sequence Preconditioning develops a framework for preconditioning target sequences in online sequential prediction by convolving with fixed polynomial coefficients, effectively applying a polynomial $p_n^{\mathbf{c}}(A)$ to the hidden transition matrix $A$ in linear dynamical systems. The authors propose using orthogonal polynomials, notably monic Chebyshev (and Legendre) coefficients, to achieve spectrum-shrinking effects that improve learnability across predictors, including regression and spectral filtering, and extend to asymmetric, marginally stable LDS with near-dimension-free regret up to polylog factors. They prove sublinear, dimension-independent regret bounds under spectral constraints and provide an extension of Chebyshev bounds to the complex plane, enabling universal guarantees. Empirical results on synthetic data (linear/nonlinear LDS and deep RNNs) and real-world ETTh1 time-series demonstrate robust performance gains and broad applicability, suggesting USP as a principled initialization or plug-in improvement for diverse sequence-prediction models.
Abstract
We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.
