Table of Contents
Fetching ...

Efficient Greedy Discrete Subtrajectory Clustering

Ivor van der Hoog, Lara Ost, Eva Rotenberg, Daniel Rutschmann

TL;DR

The work tackles subtrajectory clustering by forming Δ-clusters under the discrete Fréchet distance, enabling coherent groups of subtrajectories. It delivers optimized SC implementations and a greedy framework that uses SC as a subroutine, augmented by PSC, a 2-approximation Pareto-front algorithm over cluster size and length with time $O(n^2 \log^4 n)$. Empirical results show substantial improvements in runtime and memory relative to prior single-core methods on real and synthetic data, while achieving competitive clustering quality. The methods enable scalable map-construction and movement-pattern discovery across large trajectory collections.

Abstract

We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fréchet distance between subtrajectories in a cluster. A $Δ$-cluster of T is a cluster ${\mathcal{P}}$ of subtrajectories of T with a centre $P \in {\mathcal{P}}$ with complexity $\ell$, where all subtrajectories in ${\mathcal{P}}$ have Fréchet distance at most $Δ$ to $P$. Buchin, Buchin, Gudmundsson, Löffler and Luo present two $O(n^2 + n m \ell)$-time algorithms: SC($\max$, $\ell$, $Δ$, T) computes a single $Δ$-cluster where $P$ has at least $\ell$ vertices and maximises the cardinality $m$ of ${\mathcal{P}}$. SC($m$, $\max$, $Δ$, T) computes a single $Δ$-cluster where ${\mathcal{P}}$ has cardinality $m$ and maximises the complexity $\ell$ of $P$. We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC($\max$, $\ell$, $Δ$, T) and SC($m$, $\max$, $Δ$, T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed $Δ$ and T, these two functions always output a point on the Pareto front of some bivariate function $θ(\ell, m)$. We design a new algorithm PSC($Δ$, T) that in $O( n^2 \log^4 n)$ time computes a $2$-approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC($Δ$, T) as a subroutine improves the clustering quality and performance even further.

Efficient Greedy Discrete Subtrajectory Clustering

TL;DR

The work tackles subtrajectory clustering by forming Δ-clusters under the discrete Fréchet distance, enabling coherent groups of subtrajectories. It delivers optimized SC implementations and a greedy framework that uses SC as a subroutine, augmented by PSC, a 2-approximation Pareto-front algorithm over cluster size and length with time . Empirical results show substantial improvements in runtime and memory relative to prior single-core methods on real and synthetic data, while achieving competitive clustering quality. The methods enable scalable map-construction and movement-pattern discovery across large trajectory collections.

Abstract

We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fréchet distance between subtrajectories in a cluster. A -cluster of T is a cluster of subtrajectories of T with a centre with complexity , where all subtrajectories in have Fréchet distance at most to . Buchin, Buchin, Gudmundsson, Löffler and Luo present two -time algorithms: SC(, , , T) computes a single -cluster where has at least vertices and maximises the cardinality of . SC(, , , T) computes a single -cluster where has cardinality and maximises the complexity of . We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC(, , , T) and SC(, , , T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed and T, these two functions always output a point on the Pareto front of some bivariate function . We design a new algorithm PSC(, T) that in time computes a -approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC(, T) as a subroutine improves the clustering quality and performance even further.

Paper Structure

This paper contains 41 sections, 2 theorems, 5 equations, 39 figures, 1 table, 5 algorithms.

Key Result

Lemma 8

The set $S(\Delta, \mathcal{T})$ is a $2$-approximate Pareto front.

Figures (39)

  • Figure 1: (left) Four trajectories. (right) Five subtrajectories ${\EuScript{P}}$. Depending on the choice of $P \in {\EuScript{P}}$, we may get five different clusters $(P, {\EuScript{P}})$.
  • Figure 2: A set of trajectories $\mathcal{T}$ and the matrix $M_\Delta(\mathcal{T}, \mathcal{T})$ as a graph.
  • Figure 3: An overview of the algorithm in buchin2011detecting. We illustrate zeroes in $M_\Delta(\mathcal{T}, \mathcal{T})$ with a square.
  • Figure 4: $\blacksquare$ BBGLL implementation comparison $\blacksquare$ Running time $\blacksquare$ Logarithmic scaling
  • Figure 5: $\blacksquare$ BBGLL implementation comparison $\blacksquare$ Memory usage $\blacksquare$ Logarithmic scaling
  • ...and 34 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2: Figure \ref{['fig:pathlet']}
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Lemma 8
  • Theorem 9