Table of Contents
Fetching ...

Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection

Md Awsafur Rahman, Chandrakanth Gudavalli, Hardik Prajapati, B. S. Manjunath

Abstract

Trajectory anomaly detection underpins applications from fraud detection to urban mobility analysis. Dense GPS methods preserve fine-grained evidence such as abnormal speeds and short-duration events, but their quadratic cost makes multi-month analysis intractable; consequently, no existing approach detects anomalies over multi-month dense GPS trajectories. The field instead relies on scalable sparse stay-point methods that discard this evidence, forcing separate architectures for each regime and preventing knowledge transfer. We argue this bottleneck is unnecessary: human trajectories, dense or sparse, share a natural two-dimensional cyclic structure along within-day and across-day axes. We therefore propose TITAnD (Trajectory Image Transformer for Anomaly Detection), which reformulates trajectory anomaly detection as a vision problem by representing trajectories as a Hyperspectral Trajectory Image (HTI): a day x time-of-day grid whose channels encode spatial, semantic, temporal, and kinematic information from either modality, unifying both under a single representation. Under this formulation, agent-level detection reduces to image classification and temporal localization to semantic segmentation. To model this representation, we introduce the Cyclic Factorized Transformer (CFT), which factorizes attention along the two temporal axes, encoding the cyclic inductive bias of human routines, while reducing attention cost by orders of magnitude and enabling dense multi-month anomaly detection for the first time. Empirically, TITAnD achieves the best AUC-PR across sparse and dense benchmarks, surpassing vision models like UNet while being 11-75x faster than the Transformer with comparable memory, demonstrating that vision reformulation and structure-aware modeling are jointly essential. Code will be made public soon.

Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection

Abstract

Trajectory anomaly detection underpins applications from fraud detection to urban mobility analysis. Dense GPS methods preserve fine-grained evidence such as abnormal speeds and short-duration events, but their quadratic cost makes multi-month analysis intractable; consequently, no existing approach detects anomalies over multi-month dense GPS trajectories. The field instead relies on scalable sparse stay-point methods that discard this evidence, forcing separate architectures for each regime and preventing knowledge transfer. We argue this bottleneck is unnecessary: human trajectories, dense or sparse, share a natural two-dimensional cyclic structure along within-day and across-day axes. We therefore propose TITAnD (Trajectory Image Transformer for Anomaly Detection), which reformulates trajectory anomaly detection as a vision problem by representing trajectories as a Hyperspectral Trajectory Image (HTI): a day x time-of-day grid whose channels encode spatial, semantic, temporal, and kinematic information from either modality, unifying both under a single representation. Under this formulation, agent-level detection reduces to image classification and temporal localization to semantic segmentation. To model this representation, we introduce the Cyclic Factorized Transformer (CFT), which factorizes attention along the two temporal axes, encoding the cyclic inductive bias of human routines, while reducing attention cost by orders of magnitude and enabling dense multi-month anomaly detection for the first time. Empirically, TITAnD achieves the best AUC-PR across sparse and dense benchmarks, surpassing vision models like UNet while being 11-75x faster than the Transformer with comparable memory, demonstrating that vision reformulation and structure-aware modeling are jointly essential. Code will be made public soon.

Paper Structure

This paper contains 47 sections, 14 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of our TITAnD framework. Dense GPS streams or sparse stay-points are encoded into a unified day $\times$ time-of-day hyperspectral trajectory image (HTI), where each pixel represents spatio-semantic, temporal, and kinematic information.
  • Figure 2: TITAnD's two data-specific encoders. (a) DenseTrajEmbed reshapes a raw GPS stream into a $(D,S,P,3)$ tensor, extracts per-slot spatio-semantic (SpE), temporal (TmE), and kinematic (KnE) features, and fuses them via an MLP into an HTI $\mathbf{X}\!\in\!\mathbb{R}^{D\times S\times 256}$. (b) SparseTrajEmbed encodes stay-point logs into an interleaved stop--trip sequence and uses dedicated Stop and Trip Encoders sharing the same SpE/TmE/KnE structure before a Seq2Image module maps each event onto its occupied $(d,s)$ grid cells. Both encoders produce HTI consumed by CFT.
  • Figure 3: Cyclic Factorized Transformer (CFT) factorizes attention by interleaving intra-day and inter-day attentions. Each intra-day attention captures within-day patterns, while each inter-day attention captures cross-day routine patterns.
  • Figure 4: Scaling with time horizon (2--12 months) across HTI backbones: (a) inference latency on log scale, (b) peak GPU memory, and (c) model size.
  • Figure 5: Qualitative analysis. Rows show (top-bottom) ground-truth (gray: missing), model predictions, inter-day, and intra-day attention.