Forest Proximities for Time Series

Ben Shaw; Jake Rhodes; Soukaina Filali Boubrahimi; Kevin R. Moon

Forest Proximities for Time Series

Ben Shaw, Jake Rhodes, Soukaina Filali Boubrahimi, Kevin R. Moon

TL;DR

PF-GAP extends RF-GAP to proximity forests for time series to produce $p_{GAP}$-based proximities that can be converted to pairwise dissimilarities and vector embeddings. It defines the DGAP distance with $d_{ij}=(1-P_{ij})^2$ and demonstrates its utility for supervised MDS embeddings and LOF-based outlier detection. Across 64 UCR2018 datasets, DGAP frequently yields the best F1 scores among nine distances, and its misclassified points show a stronger link to outliers than standard 1-NN baselines. The work supports supervised, geometry-preserving embeddings and proximity-based outlier scoring in time series, with future directions including faster PF variants and broader applications.

Abstract

RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities seem to exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.

Forest Proximities for Time Series

TL;DR

PF-GAP extends RF-GAP to proximity forests for time series to produce

-based proximities that can be converted to pairwise dissimilarities and vector embeddings. It defines the DGAP distance with

and demonstrates its utility for supervised MDS embeddings and LOF-based outlier detection. Across 64 UCR2018 datasets, DGAP frequently yields the best F1 scores among nine distances, and its misclassified points show a stronger link to outliers than standard 1-NN baselines. The work supports supervised, geometry-preserving embeddings and proximity-based outlier scoring in time series, with future directions including faster PF variants and broader applications.

Abstract

Paper Structure (11 sections, 6 equations, 1 figure, 1 table)

This paper contains 11 sections, 6 equations, 1 figure, 1 table.

Introduction
Background
Proximity Forests
Random Forest Proximities
Time Series Outlier Detection
Methods
PF-GAP and Pairwise Dissimilarity
Outlier Detection
Experiments
Discussion of Limitations
Conclusion

Figures (1)

Figure 1: MDS embeddings of the GunPoint dataset.

Forest Proximities for Time Series

TL;DR

Abstract

Forest Proximities for Time Series

Authors

TL;DR

Abstract

Table of Contents

Figures (1)