Table of Contents
Fetching ...

Finding Complex Patterns in Trajectory Data via Geometric Set Cover

Jacobus Conradi, Anne Driemel

TL;DR

The paper tackles subtrajectory clustering under the Fréchet distance by reframing it as a geometric SetCover problem. It extends prior work by allowing center trajectories of complexity up to $l$, and constructs a structured set system with provable guarantees: a $11\Delta$ distance bound and an $O(\log n)$ approximation for the number of centers, with time complexity $ ilde{O}(l^2 n^4 + k l n^4)$. Key ideas include a $\Delta$-good simplification, extremal-point analysis in the free-space diagram, and discretization of the ground set, enabling scalable greedy coverage. The approach is validated on ocean-drift and full-body motion datasets, showing practical efficiency and robust clustering quality, suggesting strong potential for analyzing large spatio-temporal data.

Abstract

Clustering trajectories is a central challenge when faced with large amounts of movement data such as GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity $n$, find the smallest number $k$ of representative trajectories of complexity at most $l$ such that any point on the input trajectories lies on a subtrajectory of the input that has Fréchet distance at most $Δ$ to one of the representative trajectories. In previous work, Brüning et al.~(2022) developed a bicriteria approximation algorithm that returns a set of curves of size $O(kl\log(kl))$ which covers the input with a radius of $11Δ$ in time $\widetilde{O}((kl)^2n + kln^3)$, where $k$ is the smallest number of curves of complexity $l$ needed to cover the input with a radius of $Δ$. The representative trajectories computed by this algorithm are always line segments. In the applications however, one is usually interested in more complex representative curves which consist of several edges. We present a new approach that builds upon previous work computing a set of curves of size $O(k\log(n))$ in time $\widetilde{O}(l^2n^4 + kln^4)$ with the same distance guarantee of $11Δ$, where each curve may consist of curves of complexity up to the given complexity parameter~$l$. We conduct experiments on tracking data of ocean currents and full body motion data suggesting its validity as a tool for analyzing large spatio-temporal data sets.

Finding Complex Patterns in Trajectory Data via Geometric Set Cover

TL;DR

The paper tackles subtrajectory clustering under the Fréchet distance by reframing it as a geometric SetCover problem. It extends prior work by allowing center trajectories of complexity up to , and constructs a structured set system with provable guarantees: a distance bound and an approximation for the number of centers, with time complexity . Key ideas include a -good simplification, extremal-point analysis in the free-space diagram, and discretization of the ground set, enabling scalable greedy coverage. The approach is validated on ocean-drift and full-body motion datasets, showing practical efficiency and robust clustering quality, suggesting strong potential for analyzing large spatio-temporal data.

Abstract

Clustering trajectories is a central challenge when faced with large amounts of movement data such as GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity , find the smallest number of representative trajectories of complexity at most such that any point on the input trajectories lies on a subtrajectory of the input that has Fréchet distance at most to one of the representative trajectories. In previous work, Brüning et al.~(2022) developed a bicriteria approximation algorithm that returns a set of curves of size which covers the input with a radius of in time , where is the smallest number of curves of complexity needed to cover the input with a radius of . The representative trajectories computed by this algorithm are always line segments. In the applications however, one is usually interested in more complex representative curves which consist of several edges. We present a new approach that builds upon previous work computing a set of curves of size in time with the same distance guarantee of , where each curve may consist of curves of complexity up to the given complexity parameter~. We conduct experiments on tracking data of ocean currents and full body motion data suggesting its validity as a tool for analyzing large spatio-temporal data sets.
Paper Structure (17 sections, 8 theorems, 6 equations, 7 figures)

This paper contains 17 sections, 8 theorems, 6 equations, 7 figures.

Key Result

Lemma 2

There is an algorithm that computes a $\Delta$-good simplification of any polygonal curve $P$ in ${\mathbb R}^d$ of complexity $n$ and $\Delta>0$. Furthermore it does so in $O(n\log^2n)$ time assuming $d$ is a constant.

Figures (7)

  • Figure 1: Illustration of $\approx2000$ individual ocean surface drifters and a resulting clustering.
  • Figure 2: Illustration of the $\Delta$-coverage of $Q$ on the curve $P$.
  • Figure 3: $a)$: Example of the $\Delta$-free space of two curves. Further illustrated is the unique left $\Delta$-extremal point in red in the free space as well as the two right $\Delta$-extremal points in blue in the lower left cell of the $\Delta$-free space. $b),c)$: Illustration to the proof of Lemma \ref{['lem:key']}. Depicted are the $\Delta$-free spaces of $P$ and $Q$, as well as the path from $p$ to $q$ before the modification in $b)$ and from $\widehat{p}$ to $q$ after the modification in $c)$ as in the proof of Lemma \ref{['lem:key']}.
  • Figure 4: Illustration for the proof of Theorem \ref{['thm:main1']}. The inclusion-wise increase of the $\Delta$-Coverage for all intervals in $\mathcal{I}_{\leq,>}=\{[a_1,b_1],[a_2,b_2]\}$ is depicted. The intervals $[a_1,b_1]$ and $[a_2,b_2]$ are in $\mathcal{I}_{\leq,>}$ because the left $\Delta$-extremal points $l_1^*$ and $l_2^*$ of the cells containing $(a_1,s)$ and $(a_2,s)$ lie below $s$, and similarly $r_1^*$ and $r_2^*$ lie above $t$.
  • Figure 5: Influence of different combinations of parameters on the running time evaluated on the data set from the NOAA Global Drifter Program gdacDataset.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 1: $\Delta$-good simplification Brüning2022Faster
  • Lemma 2: Brüning2022Faster
  • Theorem 3: Brüning2022Faster
  • Definition 4: $\Delta$-free space
  • Definition 5: $\Delta$-extremal points
  • Lemma 6
  • Theorem 7
  • Theorem 8
  • Lemma 9
  • Theorem 10
  • ...and 1 more