Faster, Deterministic and Space Efficient Subtrajectory Clustering

Ivor van der Hoog; Thijs van der Horst; Tim Ophelders

Faster, Deterministic and Space Efficient Subtrajectory Clustering

Ivor van der Hoog, Thijs van der Horst, Tim Ophelders

TL;DR

This work tackles the problem of subtrajectory clustering under the Fréchet distance by seeking an $(\ell,\Delta)$-clustering that covers a trajectory with subcurves centered on curves of complexity at most $\ell$. It introduces a pathlet-preserving $2\Delta$-simplification and constrains reference curves to vertex subcurves or subedges of a unified curve $S$, enabling a deterministic, scalable greedy algorithm that achieves an $(\ell,4\Delta)$-clustering of size $\mathcal{O}(k\log n)$ in $\mathcal{O}(k n^3\log^4 n)$ time and $\mathcal{O}(n^3)$ space, where $k$ is the optimal clustering size. The approach hinges on a carefully defined universe of intervals and an efficient greedy set-cover-like procedure, underpinned by reachability graphs that translate Fréchet reachability into rectilinear shortest paths. Compared to prior deterministic and randomized methods, the method provides tighter $\Delta$-approximation (from $11\Delta$ to $4\Delta$), matches or improves space usage, and delivers near-linear per-candidate processing, representing a substantial advancement in deterministic subtrajectory clustering. The techniques have potential broad impact for map reconstruction and trajectory analysis where exact, scalable clustering under Fréchet distance is required.

Abstract

Given a trajectory $T$ and a distance $Δ$, we wish to find a set $C$ of curves of complexity at most $\ell$, such that we can cover $T$ with subcurves that each are within Fréchet distance $Δ$ to at least one curve in $C$. We call $C$ an $(\ell,Δ)$-clustering and aim to find an $(\ell,Δ)$-clustering of minimum cardinality. This problem variant was introduced by Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has therefore been on bicriteria approximation algorithms, allowing for the clustering to be an $(\ell, Θ(Δ))$-clustering of roughly optimal size. We present algorithms that construct $(\ell,4Δ)$-clusterings of $\mathcal{O}(k \log n)$ size, where $k$ is the size of the optimal $(\ell, Δ)$-clustering. We use $\mathcal{O}(n^3)$ space and $\mathcal{O}(k n^3 \log^4 n)$ time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in $Δ$) and size (whenever $\ell \in Ω(\log n / \log k)$). We offer deterministic running times improving known expected bounds by a factor near-linear in $\ell$. Additionally, we match the space usage of prior work, and improve it substantially, by a factor super-linear in $n\ell$, when compared to deterministic results.

Faster, Deterministic and Space Efficient Subtrajectory Clustering

TL;DR

This work tackles the problem of subtrajectory clustering under the Fréchet distance by seeking an

-clustering that covers a trajectory with subcurves centered on curves of complexity at most

. It introduces a pathlet-preserving

-simplification and constrains reference curves to vertex subcurves or subedges of a unified curve

, enabling a deterministic, scalable greedy algorithm that achieves an

-clustering of size

time and

space, where

is the optimal clustering size. The approach hinges on a carefully defined universe of intervals and an efficient greedy set-cover-like procedure, underpinned by reachability graphs that translate Fréchet reachability into rectilinear shortest paths. Compared to prior deterministic and randomized methods, the method provides tighter

-approximation (from

), matches or improves space usage, and delivers near-linear per-candidate processing, representing a substantial advancement in deterministic subtrajectory clustering. The techniques have potential broad impact for map reconstruction and trajectory analysis where exact, scalable clustering under Fréchet distance is required.

Abstract

Given a trajectory

and a distance

, we wish to find a set

of curves of complexity at most

, such that we can cover

with subcurves that each are within Fréchet distance

to at least one curve in

. We call

-clustering and aim to find an

-clustering of minimum cardinality. This problem variant was introduced by Akitaya

(2021) and shown to be NP-complete. The main focus has therefore been on bicriteria approximation algorithms, allowing for the clustering to be an

-clustering of roughly optimal size. We present algorithms that construct

-clusterings of

size, where

is the size of the optimal

-clustering. We use

space and

time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in

) and size (whenever

). We offer deterministic running times improving known expected bounds by a factor near-linear in

. Additionally, we match the space usage of prior work, and improve it substantially, by a factor super-linear in

, when compared to deterministic results.

Paper Structure (21 sections, 23 theorems, 5 equations, 9 figures, 1 table)

This paper contains 21 sections, 23 theorems, 5 equations, 9 figures, 1 table.

Introduction
Preliminaries
Weighting a cluster.
Algorithmic outline
Pathlet-preserving simplifications
The universe $\mathcal{U}$ and greedy set cover
Defining the universe $\mathcal{U}$.
Applying greedy set cover.
Subtrajectory clustering
The reachability graph
Constructing the graph.
Vertex-to-vertex pathlets
Subedge pathlets
Conclusion
Technical contribution.
...and 6 more sections

Key Result

Theorem 7

Let $(S, f, g)$ be a pathlet-preserving simplification of $T$. For any $(\ell, \Delta)$-pathlet $(P, \mathcal{I})$, there exists a subcurve $S[s, t]$ such that $(S[s, t], \mathcal{I})$ is an $(\ell +2 - |\mathbb{N} \cap \{s, t\}|, 4\Delta)$-pathlet.

Figures (9)

Figure 1: The trajectory $T$ (blue, left) is covered by three pathlets. Each pathlet is defined by a reference curve (green, red, yellow) and the subcurve(s) of $T$ the curve covers.
Figure 2: Top left: A simplification $S$ (red) of the trajectory $T$ (blue). Right: The diagram $\Delta'-\mathrm{FSD}(S, T)$ in white. The obstacles of the diagram are colored in gray. The clustering (bottom left) corresponds to a set of colored bimonotone paths, where paths of a given color are horizontally aligned, and the paths together span the entire vertical axis.
Figure 3: There exists a segment $P$ where $d_F(P, T[a, b]) \leq \Delta$. In contrast, for any vertex-restricted $S$ with $d_F(T[a, b], S) \leq \Delta$, the complexity of $S$ is $\Theta(|T[a, b]|)$.
Figure 4: A pathlet (left), corresponding to the red $\Delta'$-matching (right), gets split into a vertex-to-vertex and two subedge pathlets. The new pathlets correspond to the parts of the red matching that are vertically above the part of the $x$-axis corresponding to the new reference curve.
Figure 5: (left) The $\Delta'$-free space diagram of $W$ and $T$ with points $p$ and $q$ connected by a bimonotone path. (right) The obstacles of $\mathcal{R}$ are made up of all grid edges that are entirely contained in the obstacles of $\Delta'-\mathrm{FSD}(W, T)$ (shown in black) plus the gray segments. We may transform any bimonotone path between $p$ and $q$ into one that lies in $\Delta'-\mathrm{FSD}(W, T)$.
...and 4 more figures

Theorems & Definitions (35)

Definition 1: Pathlet
Definition 2
Definition 3
Definition 4: Reference optimal
Definition 5
Definition 6
Theorem 7
Theorem 8
Definition 9
Definition 10
...and 25 more

Faster, Deterministic and Space Efficient Subtrajectory Clustering

TL;DR

Abstract

Faster, Deterministic and Space Efficient Subtrajectory Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (35)