Faster, Deterministic and Space Efficient Subtrajectory Clustering
Ivor van der Hoog, Thijs van der Horst, Tim Ophelders
TL;DR
This work tackles the problem of subtrajectory clustering under the Fréchet distance by seeking an $(\ell,\Delta)$-clustering that covers a trajectory with subcurves centered on curves of complexity at most $\ell$. It introduces a pathlet-preserving $2\Delta$-simplification and constrains reference curves to vertex subcurves or subedges of a unified curve $S$, enabling a deterministic, scalable greedy algorithm that achieves an $(\ell,4\Delta)$-clustering of size $\mathcal{O}(k\log n)$ in $\mathcal{O}(k n^3\log^4 n)$ time and $\mathcal{O}(n^3)$ space, where $k$ is the optimal clustering size. The approach hinges on a carefully defined universe of intervals and an efficient greedy set-cover-like procedure, underpinned by reachability graphs that translate Fréchet reachability into rectilinear shortest paths. Compared to prior deterministic and randomized methods, the method provides tighter $\Delta$-approximation (from $11\Delta$ to $4\Delta$), matches or improves space usage, and delivers near-linear per-candidate processing, representing a substantial advancement in deterministic subtrajectory clustering. The techniques have potential broad impact for map reconstruction and trajectory analysis where exact, scalable clustering under Fréchet distance is required.
Abstract
Given a trajectory $T$ and a distance $Δ$, we wish to find a set $C$ of curves of complexity at most $\ell$, such that we can cover $T$ with subcurves that each are within Fréchet distance $Δ$ to at least one curve in $C$. We call $C$ an $(\ell,Δ)$-clustering and aim to find an $(\ell,Δ)$-clustering of minimum cardinality. This problem variant was introduced by Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has therefore been on bicriteria approximation algorithms, allowing for the clustering to be an $(\ell, Θ(Δ))$-clustering of roughly optimal size. We present algorithms that construct $(\ell,4Δ)$-clusterings of $\mathcal{O}(k \log n)$ size, where $k$ is the size of the optimal $(\ell, Δ)$-clustering. We use $\mathcal{O}(n^3)$ space and $\mathcal{O}(k n^3 \log^4 n)$ time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in $Δ$) and size (whenever $\ell \in Ω(\log n / \log k)$). We offer deterministic running times improving known expected bounds by a factor near-linear in $\ell$. Additionally, we match the space usage of prior work, and improve it substantially, by a factor super-linear in $n\ell$, when compared to deterministic results.
