Table of Contents
Fetching ...

Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better

Jacobus Conradi, Anne Driemel

TL;DR

The paper tackles subtrajectory clustering under the Fréchet distance by formalizing two problems: Subtrajectory Covering (SC) and Subtrajectory Coverage Maximization (SCM). It introduces sweep-sequences and proxy coverage to enable fast, largely deterministic greedy updates, achieving a near-subcubic to cubic-time framework with bicriteria guarantees: a $(96\ln n+128,4)$-approximation for SC and a $(\frac{e-1}{16e},4+\varepsilon)$-approximation for SCM, with runtimes scaling as ${\mathcal O}\big(n^2\ell\log^2 n + \sqrt{k_\Delta}\,n^{5/2}\log^2 n\big)$ and ${\mathcal O}\big((k+\ell)n^2\varepsilon^{-2}\log^2 n\log \varepsilon^{-1}\big)$ respectively. The key ideas—sweep-sequences to reduce a 2D search to 1D sweeps, and proxy coverage that can be maintained symbolically with limited updates—enable efficient handling of large-scale trajectory data without enumerating the full cubic set system. These results advance deterministic trajectory clustering and offer practical subquadratic behavior in regimes where the optimal cover size is small relative to $n$. The framework also clarifies the trade-offs between discretization granularity and algorithmic efficiency, and it points to open questions on potential lower bounds and real-world applicability. Overall, the work provides a principled, scalable approach to subtrajectory clustering under Fréchet distance with robust approximation guarantees.

Abstract

Many application areas collect unstructured trajectory data. In subtrajectory clustering, one is interested to find patterns in this data using a hybrid combination of segmentation and clustering. We analyze two variants of this problem based on the well-known \textsc{SetCover} and \textsc{CoverageMaximization} problems. In both variants the set system is induced by metric balls under the Fréchet distance centered at polygonal curves. Our algorithms focus on improving the running time of the update step of the generic greedy algorithm by means of a careful combination of sweeps through a candidate space. In the first variant, we are given a polygonal curve $P$ of complexity $n$, distance threshold $Δ$ and complexity bound $\ell$ and the goal is to identify a minimum-size set of center curves $\mathcal{C}$, where each center curve is of complexity at most $\ell$ and every point $p$ on $P$ is covered. A point $p$ on $P$ is covered if it is part of a subtrajectory $π_p$ of $P$ such that there is a center $c\in\mathcal{C}$ whose Fréchet distance to $π_p$ is at most $Δ$. We present an approximation algorithm for this problem with a running time of $O((n^2\ell + \sqrt{k_Δ}n^{5/2})\log^2n)$, where $k_Δ$ is the size of an optimal solution. The algorithm gives a bicriterial approximation guarantee that relaxes the Fréchet distance threshold by a constant factor and the size of the solution by a factor of $O(\log n)$. The second problem variant asks for the maximum fraction of the input curve $P$ that can be covered using $k$ center curves, where $k\leq n$ is a parameter to the algorithm. Here, we show that our techniques lead to an algorithm with a running time of $O((k+\ell)n^2\log^2 n)$ and similar approximation guarantees. Note that in both algorithms $k,k_Δ\in O(n)$ and hence the running time is cubic, or better if $k\ll n$.

Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better

TL;DR

The paper tackles subtrajectory clustering under the Fréchet distance by formalizing two problems: Subtrajectory Covering (SC) and Subtrajectory Coverage Maximization (SCM). It introduces sweep-sequences and proxy coverage to enable fast, largely deterministic greedy updates, achieving a near-subcubic to cubic-time framework with bicriteria guarantees: a -approximation for SC and a -approximation for SCM, with runtimes scaling as and respectively. The key ideas—sweep-sequences to reduce a 2D search to 1D sweeps, and proxy coverage that can be maintained symbolically with limited updates—enable efficient handling of large-scale trajectory data without enumerating the full cubic set system. These results advance deterministic trajectory clustering and offer practical subquadratic behavior in regimes where the optimal cover size is small relative to . The framework also clarifies the trade-offs between discretization granularity and algorithmic efficiency, and it points to open questions on potential lower bounds and real-world applicability. Overall, the work provides a principled, scalable approach to subtrajectory clustering under Fréchet distance with robust approximation guarantees.

Abstract

Many application areas collect unstructured trajectory data. In subtrajectory clustering, one is interested to find patterns in this data using a hybrid combination of segmentation and clustering. We analyze two variants of this problem based on the well-known \textsc{SetCover} and \textsc{CoverageMaximization} problems. In both variants the set system is induced by metric balls under the Fréchet distance centered at polygonal curves. Our algorithms focus on improving the running time of the update step of the generic greedy algorithm by means of a careful combination of sweeps through a candidate space. In the first variant, we are given a polygonal curve of complexity , distance threshold and complexity bound and the goal is to identify a minimum-size set of center curves , where each center curve is of complexity at most and every point on is covered. A point on is covered if it is part of a subtrajectory of such that there is a center whose Fréchet distance to is at most . We present an approximation algorithm for this problem with a running time of , where is the size of an optimal solution. The algorithm gives a bicriterial approximation guarantee that relaxes the Fréchet distance threshold by a constant factor and the size of the solution by a factor of . The second problem variant asks for the maximum fraction of the input curve that can be covered using center curves, where is a parameter to the algorithm. Here, we show that our techniques lead to an algorithm with a running time of and similar approximation guarantees. Note that in both algorithms and hence the running time is cubic, or better if .

Paper Structure

This paper contains 32 sections, 53 theorems, 37 equations, 9 figures, 5 algorithms.

Key Result

Theorem 3

There is a $(96\ln(n)+128,4)$-approximation for SC. Given a polygonal curve $P$ of complexity $n$, $\Delta>0$ and $\ell\leq n$, its running time is in ${\textcolor{.}{\mathcal{O}}}\left(\left(n^2\ell+\sqrt{{\textcolor{.}{k_\Delta}}}n^{\frac{5}{2}}\right)\log^2n\right)$, where ${\textcolor{.}{k_\Delt

Figures (9)

  • Figure 1: $a)$: Example of all points on $P$ that lie on subcurves of $P$ that have Fréchet distance at most $\Delta$ to a curve $Q$ of complexity $3$. $b)$: The set ${\textcolor{.}{\mathrm{Cov}}}_P(Q,\Delta)\subset[0,1]$.
  • Figure 2: $\Delta$-free space and $\alpha\Delta$-free space of two curves $P$ and $Q$, as well as an $\alpha$-approximate $\Delta$-free space $A$ of $P$ and $Q$. Additionally marked are the extremal points of $A$.
  • Figure 3: Illustration of all Type $(I)$-, $(II)$- and $(III)$-subcurves of $Q$ (as vertical lines) that are not reversals, induced by the approximate free space of $Q$ and $P$ from \ref{['fig:free-space']}. Further marked are the values $j$, which induced the set of Type $(III)$-subcurves on the first edge of $Q$.
  • Figure 4: Illustration of three of the eight sweep-sequences in ${\textcolor{.}{\mathfrak{S}_{e}}}$ of the edge $e$ that are constructed for Type $(III)$-subcurves. One for $j=1$, one for $j=2$ and one for $j=4$.
  • Figure 5: Illustration of the proxy coverage of $e[s,t]$ and ${\textcolor{.}{\mathrm{rev}({e[s,t]})}}$ compared to the coverage of $e[s,t]$. Cells with bad index are marked red. The global group of $e[s,t]$ is $\{(1,4),(4,5)\}$, and the reduced global group of $e[s,t]$ is $\{(1,5)\}$.
  • ...and 4 more figures

Theorems & Definitions (67)

  • Definition 1: Bicriterial approximation for SC
  • Definition 2: Bicriterial approximation for SCM
  • Theorem 3
  • Theorem 4
  • Definition 5: Free space diagram
  • Definition 6: Approximate free space
  • Definition 7: Extremal points Brüning2022Faster
  • Theorem 8
  • Lemma 9
  • Lemma 10
  • ...and 57 more