Learning to Forget: Bayesian Time Series Forecasting using Recurrent Sparse Spectrum Signature Gaussian Processes
Csaba Tóth, Masaki Adachi, Michael A. Osborne, Harald Oberhauser
TL;DR
This work introduces Random Fourier Decayed Signature Features (RFDSF) to inject a principled forgetting mechanism into signature-based time-series representations, enabling dynamic, data-driven adaptation of context length. By embedding RFDSF into a variational Gaussian Process framework (RS3GP), the authors achieve scalable, one-pass, autoregressive forecasting that stays faithful to probabilistic uncertainty while processing long sequences efficiently. Empirical results show competitive performance with state-of-the-art diffusion models and clear speed advantages over traditional GP baselines, with strong results on both synthetic and diverse real-world datasets. The approach offers a practical, interpretable pathway for combining structured path representations with modern Bayesian inference in time-series forecasting.
Abstract
The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $\approx 10^{-2}$ seconds and in $< 1\text{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.
