Table of Contents
Fetching ...

Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification

Yumeng Song, Yu Gu, Tianyi Li, Yushuai Li, Christian S. Jensen, Ge Yu

TL;DR

MLSimp tackles the challenge of efficient, query-aware trajectory simplification by coupling a graph neural network–based point-importance predictor (GNN-TS) with a diffusion-based generator (Diff-TS) in a mutual-learning framework. It introduces globality and uniqueness as formal metrics to capture global structure and local distinctiveness, enabling non-iterative, globally informed point retention and workload-aligned adjustments. Through alternating training, high-compression signals from Diff-TS sharpen GNN-TS’s predictions and vice versa, yielding simplified trajectories that preserve query accuracy while reducing processing time. Experiments on Geolife, T-Drive, and OSM show substantial speedups (42%–70% in simplification time) and notable gains in range, kNN, similarity, and clustering query performance, proving MLSimp’s practical impact for large-scale trajectory databases.

Abstract

As large volumes of trajectory data accumulate, simplifying trajectories to reduce storage and querying costs is increasingly studied. Existing proposals face three main problems. First, they require numerous iterations to decide which GPS points to delete. Second, they focus only on the relationships between neighboring points (local information) while neglecting the overall structure (global information), reducing the global similarity between the simplified and original trajectories and making it difficult to maintain consistency in query results, especially for similarity-based queries. Finally, they fail to differentiate the importance of points with similar features, leading to suboptimal selection of points to retain the original trajectory information. We propose MLSimp, a novel Mutual Learning query-driven trajectory simplification framework that integrates two distinct models: GNN-TS, based on graph neural networks, and Diff-TS, based on diffusion models. GNN-TS evaluates the importance of a point according to its globality, capturing its correlation with the entire trajectory, and its uniqueness, capturing its differences from neighboring points. It also incorporates attention mechanisms in the GNN layers, enabling simultaneous data integration from all points within the same trajectory and refining representations, thus avoiding iterative processes. Diff-TS generates amplified signals to enable the retention of the most important points at low compression rates. Experiments involving eight baselines on three databases show that MLSimp reduces the simplification time by 42%--70% and improves query accuracy over simplified trajectories by up to 34.6%.

Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification

TL;DR

MLSimp tackles the challenge of efficient, query-aware trajectory simplification by coupling a graph neural network–based point-importance predictor (GNN-TS) with a diffusion-based generator (Diff-TS) in a mutual-learning framework. It introduces globality and uniqueness as formal metrics to capture global structure and local distinctiveness, enabling non-iterative, globally informed point retention and workload-aligned adjustments. Through alternating training, high-compression signals from Diff-TS sharpen GNN-TS’s predictions and vice versa, yielding simplified trajectories that preserve query accuracy while reducing processing time. Experiments on Geolife, T-Drive, and OSM show substantial speedups (42%–70% in simplification time) and notable gains in range, kNN, similarity, and clustering query performance, proving MLSimp’s practical impact for large-scale trajectory databases.

Abstract

As large volumes of trajectory data accumulate, simplifying trajectories to reduce storage and querying costs is increasingly studied. Existing proposals face three main problems. First, they require numerous iterations to decide which GPS points to delete. Second, they focus only on the relationships between neighboring points (local information) while neglecting the overall structure (global information), reducing the global similarity between the simplified and original trajectories and making it difficult to maintain consistency in query results, especially for similarity-based queries. Finally, they fail to differentiate the importance of points with similar features, leading to suboptimal selection of points to retain the original trajectory information. We propose MLSimp, a novel Mutual Learning query-driven trajectory simplification framework that integrates two distinct models: GNN-TS, based on graph neural networks, and Diff-TS, based on diffusion models. GNN-TS evaluates the importance of a point according to its globality, capturing its correlation with the entire trajectory, and its uniqueness, capturing its differences from neighboring points. It also incorporates attention mechanisms in the GNN layers, enabling simultaneous data integration from all points within the same trajectory and refining representations, thus avoiding iterative processes. Diff-TS generates amplified signals to enable the retention of the most important points at low compression rates. Experiments involving eight baselines on three databases show that MLSimp reduces the simplification time by 42%--70% and improves query accuracy over simplified trajectories by up to 34.6%.

Paper Structure

This paper contains 36 sections, 17 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Circles denote GPS points, with solid lines connecting them to form trajectories. Colored circles indicate points retained in simplified trajectories, while white circles show points that are not retained by the current iteration.
  • Figure 2: GNN-TS model overview.
  • Figure 3: An example of the SED and PED errors, where (i) circles denote trajectory points, (ii) green solid and dash lines denote the original and simplification trajectory, respectively, and (iii) the red and blue solid lines denote the SED of $p_{2,5}$ and the PED of $p_{2,3}$, respectively.
  • Figure 4: $F_1$-score with different compression rate (%) on Geolife, where (a)--(d) are queries following the data distribution and (e)--(h) are queries following a Gaussian distribution.
  • Figure 5: $F_1$-score with different compression rate (%) on T-Drive, where (a)--(d) are queries following the data distribution and (e)--(h) are queries following a Gaussian distribution.
  • ...and 10 more figures

Theorems & Definitions (12)

  • Example 1
  • Example 2
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • ...and 2 more