Forest Proximities for Time Series
Ben Shaw, Jake Rhodes, Soukaina Filali Boubrahimi, Kevin R. Moon
TL;DR
PF-GAP extends RF-GAP to proximity forests for time series to produce $p_{GAP}$-based proximities that can be converted to pairwise dissimilarities and vector embeddings. It defines the DGAP distance with $d_{ij}=(1-P_{ij})^2$ and demonstrates its utility for supervised MDS embeddings and LOF-based outlier detection. Across 64 UCR2018 datasets, DGAP frequently yields the best F1 scores among nine distances, and its misclassified points show a stronger link to outliers than standard 1-NN baselines. The work supports supervised, geometry-preserving embeddings and proximity-based outlier scoring in time series, with future directions including faster PF variants and broader applications.
Abstract
RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities seem to exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.
