Table of Contents
Fetching ...

Faster, Deterministic and Space Efficient Subtrajectory Clustering

Ivor van der Hoog, Thijs van der Horst, Tim Ophelders

TL;DR

This work tackles the problem of subtrajectory clustering under the Fréchet distance by seeking an $(\ell,\Delta)$-clustering that covers a trajectory with subcurves centered on curves of complexity at most $\ell$. It introduces a pathlet-preserving $2\Delta$-simplification and constrains reference curves to vertex subcurves or subedges of a unified curve $S$, enabling a deterministic, scalable greedy algorithm that achieves an $(\ell,4\Delta)$-clustering of size $\mathcal{O}(k\log n)$ in $\mathcal{O}(k n^3\log^4 n)$ time and $\mathcal{O}(n^3)$ space, where $k$ is the optimal clustering size. The approach hinges on a carefully defined universe of intervals and an efficient greedy set-cover-like procedure, underpinned by reachability graphs that translate Fréchet reachability into rectilinear shortest paths. Compared to prior deterministic and randomized methods, the method provides tighter $\Delta$-approximation (from $11\Delta$ to $4\Delta$), matches or improves space usage, and delivers near-linear per-candidate processing, representing a substantial advancement in deterministic subtrajectory clustering. The techniques have potential broad impact for map reconstruction and trajectory analysis where exact, scalable clustering under Fréchet distance is required.

Abstract

Given a trajectory $T$ and a distance $Δ$, we wish to find a set $C$ of curves of complexity at most $\ell$, such that we can cover $T$ with subcurves that each are within Fréchet distance $Δ$ to at least one curve in $C$. We call $C$ an $(\ell,Δ)$-clustering and aim to find an $(\ell,Δ)$-clustering of minimum cardinality. This problem variant was introduced by Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has therefore been on bicriteria approximation algorithms, allowing for the clustering to be an $(\ell, Θ(Δ))$-clustering of roughly optimal size. We present algorithms that construct $(\ell,4Δ)$-clusterings of $\mathcal{O}(k \log n)$ size, where $k$ is the size of the optimal $(\ell, Δ)$-clustering. We use $\mathcal{O}(n^3)$ space and $\mathcal{O}(k n^3 \log^4 n)$ time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in $Δ$) and size (whenever $\ell \in Ω(\log n / \log k)$). We offer deterministic running times improving known expected bounds by a factor near-linear in $\ell$. Additionally, we match the space usage of prior work, and improve it substantially, by a factor super-linear in $n\ell$, when compared to deterministic results.

Faster, Deterministic and Space Efficient Subtrajectory Clustering

TL;DR

This work tackles the problem of subtrajectory clustering under the Fréchet distance by seeking an -clustering that covers a trajectory with subcurves centered on curves of complexity at most . It introduces a pathlet-preserving -simplification and constrains reference curves to vertex subcurves or subedges of a unified curve , enabling a deterministic, scalable greedy algorithm that achieves an -clustering of size in time and space, where is the optimal clustering size. The approach hinges on a carefully defined universe of intervals and an efficient greedy set-cover-like procedure, underpinned by reachability graphs that translate Fréchet reachability into rectilinear shortest paths. Compared to prior deterministic and randomized methods, the method provides tighter -approximation (from to ), matches or improves space usage, and delivers near-linear per-candidate processing, representing a substantial advancement in deterministic subtrajectory clustering. The techniques have potential broad impact for map reconstruction and trajectory analysis where exact, scalable clustering under Fréchet distance is required.

Abstract

Given a trajectory and a distance , we wish to find a set of curves of complexity at most , such that we can cover with subcurves that each are within Fréchet distance to at least one curve in . We call an -clustering and aim to find an -clustering of minimum cardinality. This problem variant was introduced by Akitaya (2021) and shown to be NP-complete. The main focus has therefore been on bicriteria approximation algorithms, allowing for the clustering to be an -clustering of roughly optimal size. We present algorithms that construct -clusterings of size, where is the size of the optimal -clustering. We use space and time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in ) and size (whenever ). We offer deterministic running times improving known expected bounds by a factor near-linear in . Additionally, we match the space usage of prior work, and improve it substantially, by a factor super-linear in , when compared to deterministic results.
Paper Structure (21 sections, 23 theorems, 5 equations, 9 figures, 1 table)

This paper contains 21 sections, 23 theorems, 5 equations, 9 figures, 1 table.

Key Result

Theorem 7

Let $(S, f, g)$ be a pathlet-preserving simplification of $T$. For any $(\ell, \Delta)$-pathlet $(P, \mathcal{I})$, there exists a subcurve $S[s, t]$ such that $(S[s, t], \mathcal{I})$ is an $(\ell +2 - |\mathbb{N} \cap \{s, t\}|, 4\Delta)$-pathlet.

Figures (9)

  • Figure 1: The trajectory $T$ (blue, left) is covered by three pathlets. Each pathlet is defined by a reference curve (green, red, yellow) and the subcurve(s) of $T$ the curve covers.
  • Figure 2: Top left: A simplification $S$ (red) of the trajectory $T$ (blue). Right: The diagram $\Delta'-\mathrm{FSD}(S, T)$ in white. The obstacles of the diagram are colored in gray. The clustering (bottom left) corresponds to a set of colored bimonotone paths, where paths of a given color are horizontally aligned, and the paths together span the entire vertical axis.
  • Figure 3: There exists a segment $P$ where $d_F(P, T[a, b]) \leq \Delta$. In contrast, for any vertex-restricted $S$ with $d_F(T[a, b], S) \leq \Delta$, the complexity of $S$ is $\Theta(|T[a, b]|)$.
  • Figure 4: A pathlet (left), corresponding to the red $\Delta'$-matching (right), gets split into a vertex-to-vertex and two subedge pathlets. The new pathlets correspond to the parts of the red matching that are vertically above the part of the $x$-axis corresponding to the new reference curve.
  • Figure 5: (left) The $\Delta'$-free space diagram of $W$ and $T$ with points $p$ and $q$ connected by a bimonotone path. (right) The obstacles of $\mathcal{R}$ are made up of all grid edges that are entirely contained in the obstacles of $\Delta'-\mathrm{FSD}(W, T)$ (shown in black) plus the gray segments. We may transform any bimonotone path between $p$ and $q$ into one that lies in $\Delta'-\mathrm{FSD}(W, T)$.
  • ...and 4 more figures

Theorems & Definitions (35)

  • Definition 1: Pathlet
  • Definition 2
  • Definition 3
  • Definition 4: Reference optimal
  • Definition 5
  • Definition 6
  • Theorem 7
  • Theorem 8
  • Definition 9
  • Definition 10
  • ...and 25 more