Efficient Greedy Discrete Subtrajectory Clustering
Ivor van der Hoog, Lara Ost, Eva Rotenberg, Daniel Rutschmann
TL;DR
The work tackles subtrajectory clustering by forming Δ-clusters under the discrete Fréchet distance, enabling coherent groups of subtrajectories. It delivers optimized SC implementations and a greedy framework that uses SC as a subroutine, augmented by PSC, a 2-approximation Pareto-front algorithm over cluster size and length with time $O(n^2 \log^4 n)$. Empirical results show substantial improvements in runtime and memory relative to prior single-core methods on real and synthetic data, while achieving competitive clustering quality. The methods enable scalable map-construction and movement-pattern discovery across large trajectory collections.
Abstract
We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fréchet distance between subtrajectories in a cluster. A $Δ$-cluster of T is a cluster ${\mathcal{P}}$ of subtrajectories of T with a centre $P \in {\mathcal{P}}$ with complexity $\ell$, where all subtrajectories in ${\mathcal{P}}$ have Fréchet distance at most $Δ$ to $P$. Buchin, Buchin, Gudmundsson, Löffler and Luo present two $O(n^2 + n m \ell)$-time algorithms: SC($\max$, $\ell$, $Δ$, T) computes a single $Δ$-cluster where $P$ has at least $\ell$ vertices and maximises the cardinality $m$ of ${\mathcal{P}}$. SC($m$, $\max$, $Δ$, T) computes a single $Δ$-cluster where ${\mathcal{P}}$ has cardinality $m$ and maximises the complexity $\ell$ of $P$. We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC($\max$, $\ell$, $Δ$, T) and SC($m$, $\max$, $Δ$, T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed $Δ$ and T, these two functions always output a point on the Pareto front of some bivariate function $θ(\ell, m)$. We design a new algorithm PSC($Δ$, T) that in $O( n^2 \log^4 n)$ time computes a $2$-approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC($Δ$, T) as a subroutine improves the clustering quality and performance even further.
