Table of Contents
Fetching ...

Encoding Agent Trajectories as Representations with Sequence Transformers

Athanasios Tsiligkaridis, Nicholas Kalinowski, Zhongheng Li, Elizabeth Hou

TL;DR

The paper tackles learning meaningful representations from spatiotemporal trajectories by treating them as sequences and applying a Transformer-based encoder. It introduces STARE, which discretizes trajectories into Persistent Locations mapped to S2 cells and dwell times, then processes two aligned subsequences (locations and durations) with a unified Transformer Encoder Stack. STARE employs two training heads—a classifier for agent/location labels and a masked location modelling decoder—to capture both agent-specific patterns and relationships between locations, achieving strong performance across simulated and real PoL datasets, and revealing interpretable structure in the embedding space. The work demonstrates STARE’s ability to learn informative embeddings without road-network data, offers scalability to large datasets, and provides insights into agent-location dynamics and Patterns of Life with potential for pretraining and downstream tasks such as clustering and relationship discovery.

Abstract

Spatiotemporal data faces many analogous challenges to natural language text including the ordering of locations (words) in a sequence, long range dependencies between locations, and locations having multiple meanings. In this work, we propose a novel model for representing high dimensional spatiotemporal trajectories as sequences of discrete locations and encoding them with a Transformer-based neural network architecture. Similar to language models, our Sequence Transformer for Agent Representation Encodings (STARE) model can learn representations and structure in trajectory data through both supervisory tasks (e.g., classification), and self-supervisory tasks (e.g., masked modelling). We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings that are useful for many downstream tasks including discriminating between labels and indicating similarity between locations. Using these encodings, we also learn relationships between agents and locations present in spatiotemporal data.

Encoding Agent Trajectories as Representations with Sequence Transformers

TL;DR

The paper tackles learning meaningful representations from spatiotemporal trajectories by treating them as sequences and applying a Transformer-based encoder. It introduces STARE, which discretizes trajectories into Persistent Locations mapped to S2 cells and dwell times, then processes two aligned subsequences (locations and durations) with a unified Transformer Encoder Stack. STARE employs two training heads—a classifier for agent/location labels and a masked location modelling decoder—to capture both agent-specific patterns and relationships between locations, achieving strong performance across simulated and real PoL datasets, and revealing interpretable structure in the embedding space. The work demonstrates STARE’s ability to learn informative embeddings without road-network data, offers scalability to large datasets, and provides insights into agent-location dynamics and Patterns of Life with potential for pretraining and downstream tasks such as clustering and relationship discovery.

Abstract

Spatiotemporal data faces many analogous challenges to natural language text including the ordering of locations (words) in a sequence, long range dependencies between locations, and locations having multiple meanings. In this work, we propose a novel model for representing high dimensional spatiotemporal trajectories as sequences of discrete locations and encoding them with a Transformer-based neural network architecture. Similar to language models, our Sequence Transformer for Agent Representation Encodings (STARE) model can learn representations and structure in trajectory data through both supervisory tasks (e.g., classification), and self-supervisory tasks (e.g., masked modelling). We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings that are useful for many downstream tasks including discriminating between labels and indicating similarity between locations. Using these encodings, we also learn relationships between agents and locations present in spatiotemporal data.

Paper Structure

This paper contains 24 sections, 4 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Example of extracted PLs (shown in red) from an agent's PoL along with the S2 cells (shown in yellow) that each PL resides in.
  • Figure 2: Our proposed STARE Transformer-based architecture, which uses a TES to create meaningful encodings of input data, an MLP to make classification predictions, and a Linear decoder to decode the encodings.
  • Figure 3: Matrix of average predicted probability scores for the (S) dataset (consists of $37$ agents) where the rows and columns are the true and predicted agent labels, respectively.
  • Figure 4: Visualization of the similar PoLs between misclassified agents. The agents corresponding to the $1$st/$3$rd rows of the matrix shown in Figure \ref{['fig:conf_labels']} are in red/yellow and those for $8$th/$9$th rows are in green/pink.
  • Figure 5: Low dimension embeddings obtained from applying t-SNE on the high dimensional TES embeddings for the (S) dataset. We see that agents that belong to the same underlying subpopulations tend to lie close to each other in the embedding space, as can be seen with agents [$1$,$3$] and agents [$8$,$9$], which have consistent colors with those in Figure \ref{['fig:example_misclassifications']}.
  • ...and 14 more figures