Table of Contents
Fetching ...

Signature Isolation Forest

Marta Campi, Guillaume Staerman, Gareth W. Peters, Tomoko Matsui

TL;DR

The paper targets functional anomaly detection and the sensitivity of existing FIF to representation choices. It introduces two algorithms, K-SIF and SIF, that embed functional paths via rough path signatures and either use a signature kernel or coordinate signatures to perform nonlinear, dictionary-free splits within an isolation forest framework. Through parameter sweeps, swap-order tests, and real-data benchmarks, the authors show that K-SIF often surpasses FIF and that SIF achieves state-of-the-art performance with robustness and computational efficiency. The work provides practical, data-driven tools for reliable, scalable functional anomaly detection across diverse datasets.

Abstract

Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm's performances and might lead to unreliable results, particularly with complex datasets. This work addresses these challenges by introducing \textit{Signature Isolation Forest}, a novel AD algorithm class leveraging the rough path theory's signature transform. Our objective is to remove the constraints imposed by FIF through the proposition of two algorithms which specifically target the linearity of the FIF inner product and the choice of the dictionary. We provide several numerical experiments, including a real-world applications benchmark showing the relevance of our methods.

Signature Isolation Forest

TL;DR

The paper targets functional anomaly detection and the sensitivity of existing FIF to representation choices. It introduces two algorithms, K-SIF and SIF, that embed functional paths via rough path signatures and either use a signature kernel or coordinate signatures to perform nonlinear, dictionary-free splits within an isolation forest framework. Through parameter sweeps, swap-order tests, and real-data benchmarks, the authors show that K-SIF often surpasses FIF and that SIF achieves state-of-the-art performance with robustness and computational efficiency. The work provides practical, data-driven tools for reliable, scalable functional anomaly detection across diverse datasets.

Abstract

Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm's performances and might lead to unreliable results, particularly with complex datasets. This work addresses these challenges by introducing \textit{Signature Isolation Forest}, a novel AD algorithm class leveraging the rough path theory's signature transform. Our objective is to remove the constraints imposed by FIF through the proposition of two algorithms which specifically target the linearity of the FIF inner product and the choice of the dictionary. We provide several numerical experiments, including a real-world applications benchmark showing the relevance of our methods.
Paper Structure (19 sections, 1 theorem, 33 equations, 12 figures, 9 tables)

This paper contains 19 sections, 1 theorem, 33 equations, 12 figures, 9 tables.

Key Result

Proposition 1.1

Let $\mathbf{X} \in \mathcal{F}^d([s, t])$ and $\mathbf{Y} \in \mathcal{F}^d([s, t])$ two functions with bounded variation. Then for any index $(i_1, \ldots, i_k) \subset \{1, \ldots, d \}^k$,

Figures (12)

  • Figure 1: Geometric visualization of depth-2 signature terms, where $S^{(1,2)}$ (cyan region) and $S^{(2,1)}$ (purple region) represent areas corresponding to coordinate signatures. The displacement term $\Delta X_1$ and $\Delta X_2$ along each axis capture the depth-1 terms of the transform.
  • Figure 2: AUC for the ROC curve w.r.t. the number of split window on the first (top) and the second (bottom) datasets for the three dictionaries.
  • Figure 3: Anomaly score for normal (purple) and abnormal (yellow) data for SIF, K-SIF and FIF with Brownian and Cosine dictionaries.
  • Figure 4: Barplot of performance differences with AUC between K-SIF and FIF with a Brownian motion kernel (positive means K-SIF performs better), the inner product chose for FIF is L2 (top) and L2 of derivative (bottom).
  • Figure 5: Brownian Motion Process Results. Kendall tau correlation between the score returned by SIF (purple) and K-SIF with different depth values, $\omega=3$ (left) and $\omega=5$ (right) , for the three dictionaries: ‘Brownian' (blue), ‘Cosine' (orange) and ‘Gaussian wavelets' (green) on three dimensional Brownian paths.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Definition 3.1
  • Proposition 1.1: Chen's Identity, chen1958integration
  • Remark 1.2
  • Remark 2.1: Link between K-SIF and FIF