Novelty detection on path space
Ioannis Gasteratos, Antoine Jacquier, Maud Lemercier, Terry Lyons, Cristopher Salvi
TL;DR
This paper reframes novelty detection for trajectories as a hypothesis test on path space using signature-based statistics. It derives tail bounds for false positives via transportation-cost inequalities, extending beyond Gaussian measures to laws of RDE solutions, and provides exact smooth CVaR surrogates expressible through the expected signature, enabling OC-SVMs that optimise smooth CVaR. It also establishes lower bounds on type-II error for absolutely continuous alternatives, yielding general power guarantees, and validates the approach with synthetic anomalous diffusion data and real RNA nanopore sequencing data. The results offer principled, non-Gaussian, path-space testing tools and practical anomaly detection capabilities in complex, high-dimensional sequential data contexts.
Abstract
We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that extend beyond Gaussian measures to laws of RDE solutions with smooth bounded vector fields, yielding estimates of quantiles and p-values. Exploiting the shuffle product, we derive exact formulae for smooth surrogates of conditional value-at-risk (CVaR) in terms of expected signatures, leading to new one-class SVM algorithms optimising smooth CVaR objectives. We then establish lower bounds on type-$\mathrm{II}$ error for alternatives with finite first moment, giving general power bounds when the reference measure and the alternative are absolutely continuous with respect to each other. Finally, we evaluate numerically the type-$\mathrm{I}$ error and statistical power of signature-based test statistic, using synthetic anomalous diffusion data and real-world molecular biology data.
