Encoding Agent Trajectories as Representations with Sequence Transformers
Athanasios Tsiligkaridis, Nicholas Kalinowski, Zhongheng Li, Elizabeth Hou
TL;DR
The paper tackles learning meaningful representations from spatiotemporal trajectories by treating them as sequences and applying a Transformer-based encoder. It introduces STARE, which discretizes trajectories into Persistent Locations mapped to S2 cells and dwell times, then processes two aligned subsequences (locations and durations) with a unified Transformer Encoder Stack. STARE employs two training heads—a classifier for agent/location labels and a masked location modelling decoder—to capture both agent-specific patterns and relationships between locations, achieving strong performance across simulated and real PoL datasets, and revealing interpretable structure in the embedding space. The work demonstrates STARE’s ability to learn informative embeddings without road-network data, offers scalability to large datasets, and provides insights into agent-location dynamics and Patterns of Life with potential for pretraining and downstream tasks such as clustering and relationship discovery.
Abstract
Spatiotemporal data faces many analogous challenges to natural language text including the ordering of locations (words) in a sequence, long range dependencies between locations, and locations having multiple meanings. In this work, we propose a novel model for representing high dimensional spatiotemporal trajectories as sequences of discrete locations and encoding them with a Transformer-based neural network architecture. Similar to language models, our Sequence Transformer for Agent Representation Encodings (STARE) model can learn representations and structure in trajectory data through both supervisory tasks (e.g., classification), and self-supervisory tasks (e.g., masked modelling). We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings that are useful for many downstream tasks including discriminating between labels and indicating similarity between locations. Using these encodings, we also learn relationships between agents and locations present in spatiotemporal data.
