Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better
Jacobus Conradi, Anne Driemel
TL;DR
The paper tackles subtrajectory clustering under the Fréchet distance by formalizing two problems: Subtrajectory Covering (SC) and Subtrajectory Coverage Maximization (SCM). It introduces sweep-sequences and proxy coverage to enable fast, largely deterministic greedy updates, achieving a near-subcubic to cubic-time framework with bicriteria guarantees: a $(96\ln n+128,4)$-approximation for SC and a $(\frac{e-1}{16e},4+\varepsilon)$-approximation for SCM, with runtimes scaling as ${\mathcal O}\big(n^2\ell\log^2 n + \sqrt{k_\Delta}\,n^{5/2}\log^2 n\big)$ and ${\mathcal O}\big((k+\ell)n^2\varepsilon^{-2}\log^2 n\log \varepsilon^{-1}\big)$ respectively. The key ideas—sweep-sequences to reduce a 2D search to 1D sweeps, and proxy coverage that can be maintained symbolically with limited updates—enable efficient handling of large-scale trajectory data without enumerating the full cubic set system. These results advance deterministic trajectory clustering and offer practical subquadratic behavior in regimes where the optimal cover size is small relative to $n$. The framework also clarifies the trade-offs between discretization granularity and algorithmic efficiency, and it points to open questions on potential lower bounds and real-world applicability. Overall, the work provides a principled, scalable approach to subtrajectory clustering under Fréchet distance with robust approximation guarantees.
Abstract
Many application areas collect unstructured trajectory data. In subtrajectory clustering, one is interested to find patterns in this data using a hybrid combination of segmentation and clustering. We analyze two variants of this problem based on the well-known \textsc{SetCover} and \textsc{CoverageMaximization} problems. In both variants the set system is induced by metric balls under the Fréchet distance centered at polygonal curves. Our algorithms focus on improving the running time of the update step of the generic greedy algorithm by means of a careful combination of sweeps through a candidate space. In the first variant, we are given a polygonal curve $P$ of complexity $n$, distance threshold $Δ$ and complexity bound $\ell$ and the goal is to identify a minimum-size set of center curves $\mathcal{C}$, where each center curve is of complexity at most $\ell$ and every point $p$ on $P$ is covered. A point $p$ on $P$ is covered if it is part of a subtrajectory $π_p$ of $P$ such that there is a center $c\in\mathcal{C}$ whose Fréchet distance to $π_p$ is at most $Δ$. We present an approximation algorithm for this problem with a running time of $O((n^2\ell + \sqrt{k_Δ}n^{5/2})\log^2n)$, where $k_Δ$ is the size of an optimal solution. The algorithm gives a bicriterial approximation guarantee that relaxes the Fréchet distance threshold by a constant factor and the size of the solution by a factor of $O(\log n)$. The second problem variant asks for the maximum fraction of the input curve $P$ that can be covered using $k$ center curves, where $k\leq n$ is a parameter to the algorithm. Here, we show that our techniques lead to an algorithm with a running time of $O((k+\ell)n^2\log^2 n)$ and similar approximation guarantees. Note that in both algorithms $k,k_Δ\in O(n)$ and hence the running time is cubic, or better if $k\ll n$.
