Table of Contents
Fetching ...

Revisiting CNNs for Trajectory Similarity Learning

Zhihao Chang, Linzhu Yu, Huan Li, Sai Wu, Gang Chen, Dongxiang Zhang

TL;DR

It is argued that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences, and introduces ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively.

Abstract

Similarity search is a fundamental but expensive operator in querying trajectory data, due to its quadratic complexity of distance computation. To mitigate the computational burden for long trajectories, neural networks have been widely employed for similarity learning and each trajectory is encoded as a high-dimensional vector for similarity search with linear complexity. Given the sequential nature of trajectory data, previous efforts have been primarily devoted to the utilization of RNNs or Transformers. In this paper, we argue that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences. Instead, our investigation reveals the pivotal role of local similarity, prompting a revisit of simple CNNs for trajectory similarity learning. We introduce ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively. In addition, we conduct a series of theoretical analyses to justify the effectiveness of ConvTraj. Experimental results on four real-world large-scale datasets demonstrate that ConvTraj achieves state-of-the-art accuracy in trajectory similarity search. Owing to the simple network structure of ConvTraj, the training and inference speed on the Porto dataset with 1.6 million trajectories are increased by at least $240$x and $2.16$x, respectively. The source code and dataset can be found at \textit{\url{https://github.com/Proudc/ConvTraj}}.

Revisiting CNNs for Trajectory Similarity Learning

TL;DR

It is argued that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences, and introduces ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively.

Abstract

Similarity search is a fundamental but expensive operator in querying trajectory data, due to its quadratic complexity of distance computation. To mitigate the computational burden for long trajectories, neural networks have been widely employed for similarity learning and each trajectory is encoded as a high-dimensional vector for similarity search with linear complexity. Given the sequential nature of trajectory data, previous efforts have been primarily devoted to the utilization of RNNs or Transformers. In this paper, we argue that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences. Instead, our investigation reveals the pivotal role of local similarity, prompting a revisit of simple CNNs for trajectory similarity learning. We introduce ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively. In addition, we conduct a series of theoretical analyses to justify the effectiveness of ConvTraj. Experimental results on four real-world large-scale datasets demonstrate that ConvTraj achieves state-of-the-art accuracy in trajectory similarity search. Owing to the simple network structure of ConvTraj, the training and inference speed on the Porto dataset with 1.6 million trajectories are increased by at least x and x, respectively. The source code and dataset can be found at \textit{\url{https://github.com/Proudc/ConvTraj}}.
Paper Structure (26 sections, 3 theorems, 11 equations, 7 figures, 16 tables)

This paper contains 26 sections, 3 theorems, 11 equations, 7 figures, 16 tables.

Key Result

Theorem 5.1

Given two trajectories $X= ()$, $Y= ()$, and $d_F(X, Y) = d_{xy}$. A one-dimensional convolution operation $C(\cdot)$ on $X$ and $Y$ with stride 1, padding 1, and kernel $k= ()$ . We have: where $x_0^c=c(X_0, k),y_0^c=c(Y_0, k),x_M^c=c(X_M, k),y_N^c=c(Y_N, k)$, $S_i=\sum_{j=0}^{2}k_{i, j}$, and $\Delta=\max\limits_{0\leq i\leq M, 0\leq j\leq N} |\sum\nolimits_{m=0}^{1} k_{m, 0}*(\delta_{m, j}^{y}

Figures (7)

  • Figure 1: Texts feature intercrossed matching pairs, whereas trajectories do not.
  • Figure 2: Input preprocessing and network structure of ConvTraj.
  • Figure 3: The training pipeline of ConvTraj.
  • Figure 4: 1D convolution bound visualization on Porto.
  • Figure 5: 1D max-pooling bound visualization on Porto.
  • ...and 2 more figures

Theorems & Definitions (9)

  • Definition 1: Trajectory
  • Definition 2: Trajectory Measure Embedding
  • Definition 3: Trajectory Coupling
  • Definition 4: Discrete Frechet Distance
  • Definition 5
  • Theorem 5.1: One-dimensional Convolution Bound
  • Theorem 5.2: One-dimensional Max-Pooling Bound
  • Definition 6: Trajectory MBR Distance
  • Theorem 5.3: Two-dimensional Convolution Bound