Table of Contents
Fetching ...

A Primer on the Signature Method in Machine Learning

Ilya Chevyrev, Andrey Kormilitzin

TL;DR

The paper introduces the signature method as a principled, geometry-driven representation for time-ordered data by encoding a path with the infinite sequence of iterated integrals $S(X)_{a,b}$. It develops the theoretical foundations—definition, invariances, and algebraic structure via the shuffle product, Chen's identity, time-reversal, and the log-signature—while connecting signatures to rough path theory and the moment problem for random paths. On the practical side, it details how to convert discrete data into paths, apply augmentations and transformations, compute signatures with available software, and use signature/log-signature features in machine learning, including a handwritten digit classification example. The framework provides a flexible, nonparametric feature extraction pipeline with strong geometric interpretation, enabling effective modeling of sequential data across finance, healthcare, computer vision, and beyond.

Abstract

We provide an introduction to the signature method, focusing on its theoretical properties and machine learning applications. Our presentation is divided into two parts. In the first part, we present the definition and fundamental properties of the signature of a path. The signature is a sequence of numbers associated with a path that captures many of its important analytic and geometric properties. As a sequence of numbers, the signature serves as a compact description (dimension reduction) of a path. In presenting its theoretical properties, we assume only familiarity with classical real analysis and integration, and supplement theory with straightforward examples. We also mention several advanced topics, including the role of the signature in rough path theory. In the second part, we present practical applications of the signature to the area of machine learning. The signature method is a non-parametric way of transforming data into a set of features that can be used in machine learning tasks. In this method, data are converted into multi-dimensional paths, by means of embedding algorithms, of which the signature is then computed. We describe this pipeline in detail, making a link with the properties of the signature presented in the first part. We furthermore review some of the developments of the signature method in machine learning and, as an illustrative example, present a detailed application of the method to handwritten digit classification.

A Primer on the Signature Method in Machine Learning

TL;DR

The paper introduces the signature method as a principled, geometry-driven representation for time-ordered data by encoding a path with the infinite sequence of iterated integrals . It develops the theoretical foundations—definition, invariances, and algebraic structure via the shuffle product, Chen's identity, time-reversal, and the log-signature—while connecting signatures to rough path theory and the moment problem for random paths. On the practical side, it details how to convert discrete data into paths, apply augmentations and transformations, compute signatures with available software, and use signature/log-signature features in machine learning, including a handwritten digit classification example. The framework provides a flexible, nonparametric feature extraction pipeline with strong geometric interpretation, enabling effective modeling of sequential data across finance, healthcare, computer vision, and beyond.

Abstract

We provide an introduction to the signature method, focusing on its theoretical properties and machine learning applications. Our presentation is divided into two parts. In the first part, we present the definition and fundamental properties of the signature of a path. The signature is a sequence of numbers associated with a path that captures many of its important analytic and geometric properties. As a sequence of numbers, the signature serves as a compact description (dimension reduction) of a path. In presenting its theoretical properties, we assume only familiarity with classical real analysis and integration, and supplement theory with straightforward examples. We also mention several advanced topics, including the role of the signature in rough path theory. In the second part, we present practical applications of the signature to the area of machine learning. The signature method is a non-parametric way of transforming data into a set of features that can be used in machine learning tasks. In this method, data are converted into multi-dimensional paths, by means of embedding algorithms, of which the signature is then computed. We describe this pipeline in detail, making a link with the properties of the signature presented in the first part. We furthermore review some of the developments of the signature method in machine learning and, as an illustrative example, present a detailed application of the method to handwritten digit classification.

Paper Structure

This paper contains 34 sections, 12 theorems, 145 equations, 25 figures, 3 tables.

Key Result

theorem 1

Consider a path $X : [a,b] \to \mathbb R^d$ and two multi-indexes $I = (i_1,\ldots, i_k)$ and $J = (j_1,\ldots, j_m)$ with $i_1,\ldots,i_k,j_1,\ldots, j_m \in \{ 1,\ldots, d\}$. Then

Figures (25)

  • Figure 1: Example of two-dimensional smooth paths.
  • Figure 2: Example of non-smooth path.
  • Figure 3: Example of a 2-dimensional path parametrized in \ref{['eq:differentialsOfPath']}.
  • Figure 4: Example of sequential Picard approximation to the true solution.
  • Figure 5: The path $X : [0,T]\to \mathbb R^2$ is an approximation of a two-dimensional Brownian motion and $\tilde{X}:[0,T]\to\mathbb R^2$ is a small perturbation of $X$ at every point. The horizontal axis denotes times and the orange and blue lines indicate the two components $X^1,X^2$; likewise for $\tilde{X}$. The paths $Y,\tilde{Y}:[0,T] \to \mathbb R$ are solutions to the ODEs $d Y_t = V(Y_t) \, d X_t$ and $d \tilde{Y}_t = V(\tilde{Y}_t) \, d \tilde{X}_t$ with the same initial point $Y_0=\tilde{Y}_0$ and with vector fields $V_1(y)=y/100$ and $V_2(y)=y\sin(y)/100$. Although $X$ and $\tilde{X}$ are close in the uniform norm, $Y$ and $\tilde{Y}$ differ significantly.
  • ...and 20 more figures

Theorems & Definitions (26)

  • definition 1: Signature
  • definition 2: Shuffle product
  • theorem 1: Shuffle product identity
  • proof
  • theorem 2: Chen's identity
  • proof
  • definition 3: Formal power series
  • definition 4: Concatenation
  • theorem 3: Chen's identity
  • corollary 1: Signature of piecewise linear path
  • ...and 16 more