Online Partitioned Local Depth for semi-supervised applications
John D. Foley, Justin T. Lee
TL;DR
The paper introduces online PaLD, an online extension of the partitioned local depth framework designed for fixed-reference semi-supervised and online anomaly detection tasks. By performing an upfront $O(n^3)$ preprocessing to construct a queryable cohesion network, it enables exact online integration of each new datum at $O(n^2)$ time, balancing accuracy with scalability. The authors present a concrete query formulation, including CohesionToNew and CohesionToS routines, and validate speedups while showcasing two health-care-centric applications: online anomaly detection and high-dimensional semi-supervised classification. The work discusses practical considerations, potential hybridization with approximation methods, and avenues for extending the approach to generalized PaLD frameworks and large-scale data contexts.
Abstract
We introduce an extension of the partitioned local depth (PaLD) algorithm that is adapted to online applications such as semi-supervised prediction. The new algorithm we present, online PaLD, is well-suited to situations where it is a possible to pre-compute a cohesion network from a reference dataset. After $O(n^3)$ steps to construct a queryable data structure, online PaLD can extend the cohesion network to a new data point in $O(n^2)$ time. Our approach complements previous speed up approaches based on approximation and parallelism. For illustrations, we present applications to online anomaly detection and semi-supervised classification for health-care datasets.
