Finding Complex Patterns in Trajectory Data via Geometric Set Cover
Jacobus Conradi, Anne Driemel
TL;DR
The paper tackles subtrajectory clustering under the Fréchet distance by reframing it as a geometric SetCover problem. It extends prior work by allowing center trajectories of complexity up to $l$, and constructs a structured set system with provable guarantees: a $11\Delta$ distance bound and an $O(\log n)$ approximation for the number of centers, with time complexity $ ilde{O}(l^2 n^4 + k l n^4)$. Key ideas include a $\Delta$-good simplification, extremal-point analysis in the free-space diagram, and discretization of the ground set, enabling scalable greedy coverage. The approach is validated on ocean-drift and full-body motion datasets, showing practical efficiency and robust clustering quality, suggesting strong potential for analyzing large spatio-temporal data.
Abstract
Clustering trajectories is a central challenge when faced with large amounts of movement data such as GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity $n$, find the smallest number $k$ of representative trajectories of complexity at most $l$ such that any point on the input trajectories lies on a subtrajectory of the input that has Fréchet distance at most $Δ$ to one of the representative trajectories. In previous work, Brüning et al.~(2022) developed a bicriteria approximation algorithm that returns a set of curves of size $O(kl\log(kl))$ which covers the input with a radius of $11Δ$ in time $\widetilde{O}((kl)^2n + kln^3)$, where $k$ is the smallest number of curves of complexity $l$ needed to cover the input with a radius of $Δ$. The representative trajectories computed by this algorithm are always line segments. In the applications however, one is usually interested in more complex representative curves which consist of several edges. We present a new approach that builds upon previous work computing a set of curves of size $O(k\log(n))$ in time $\widetilde{O}(l^2n^4 + kln^4)$ with the same distance guarantee of $11Δ$, where each curve may consist of curves of complexity up to the given complexity parameter~$l$. We conduct experiments on tracking data of ocean currents and full body motion data suggesting its validity as a tool for analyzing large spatio-temporal data sets.
