Learning from the past, predicting the statistics for the future, learning an evolving system

Daniel Levin; Terry Lyons; Hao Ni

Learning from the past, predicting the statistics for the future, learning an evolving system

Daniel Levin, Terry Lyons, Hao Ni

TL;DR

The paper addresses learning from streaming data by representing full streams with signatures from rough path theory to enable non-parametric regression on path space.It introduces the expected signature (ES) framework, where outputs are modeled as linear functionals of input signatures plus noise, yielding a powerful, dimension-reducing feature set with strong approximation properties.By connecting ES to time series via time-joined embeddings, the authors show that classical models like AR and ARCH are special cases of ES, unifying parametric and non-parametric approaches under a common path-space regression paradigm.Empirical results on simulated time series demonstrate that ES achieves GP-level predictive accuracy with substantially lower computational cost and favorable robustness, highlighting practical benefits for streaming-data inference.

Abstract

We bring the theory of rough paths to the study of non-parametric statistics on streamed data. We discuss the problem of regression where the input variable is a stream of information, and the dependent response is also (potentially) a stream. A certain graded feature set of a stream, known in the rough path literature as the signature, has a universality that allows formally, linear regression to be used to characterise the functional relationship between independent explanatory variables and the conditional distribution of the dependent response. This approach, via linear regression on the signature of the stream, is almost totally general, and yet it still allows explicit computation. The grading allows truncation of the feature set and so leads to an efficient local description for streams (rough paths). In the statistical context this method offers potentially significant, even transformational dimension reduction. By way of illustration, our approach is applied to stationary time series including the familiar AR model and ARCH model. In the numerical examples we examined, our predictions achieve similar accuracy to the Gaussian Process (GP) approach with much lower computational cost especially when the sample size is large.

Learning from the past, predicting the statistics for the future, learning an evolving system

TL;DR

Abstract

Learning from the past, predicting the statistics for the future, learning an evolving system

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (51)