Table of Contents
Fetching ...

Efficient Temporal Tokenization for Mobility Prediction with Large Language Models

Haoyu He, Haozheng Luo, Yan Chen, Qi R. Wang

TL;DR

RHYTHM reframes human mobility prediction as a hierarchical temporal tokenization problem that leverages a frozen large language model (LLM) backbone to perform spatio-temporal reasoning with pre-computed semantic prompts. By segmenting trajectories into daily tokens and applying intra- and inter-segment attention, RHYTHM dramatically reduces sequence length while capturing multi-scale temporal patterns. Semantic context is integrated via offline prompt embeddings, enabling the LLM to perform contextual reasoning without online inference, which yields improved accuracy (notably Acc@1) and substantial training efficiency (reduced time). The approach demonstrates robust gains across three real-world urban datasets and shows scalability with larger LLM backbones, offering a practical foundation for resource-aware mobility forecasting with foundation models.

Abstract

We introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a framework that leverages large language models (LLMs) as spatio-temporal predictors and trajectory reasoners. RHYTHM partitions trajectories into daily segments encoded as discrete tokens with hierarchical attention, capturing both daily and weekly dependencies while substantially reducing the sequence length. Token representations are enriched with pre-computed prompt embeddings via a frozen LLM, enhancing the model's ability to capture interdependencies without extensive computational overhead. By freezing the LLM backbone, RHYTHM achieves significant computational efficiency. Evaluation on three real-world datasets demonstrates a 2.4% improvement in accuracy, 5.0% increase on weekends, and 24.6% reduction in training time compared to state-of-the-art methods.

Efficient Temporal Tokenization for Mobility Prediction with Large Language Models

TL;DR

RHYTHM reframes human mobility prediction as a hierarchical temporal tokenization problem that leverages a frozen large language model (LLM) backbone to perform spatio-temporal reasoning with pre-computed semantic prompts. By segmenting trajectories into daily tokens and applying intra- and inter-segment attention, RHYTHM dramatically reduces sequence length while capturing multi-scale temporal patterns. Semantic context is integrated via offline prompt embeddings, enabling the LLM to perform contextual reasoning without online inference, which yields improved accuracy (notably Acc@1) and substantial training efficiency (reduced time). The approach demonstrates robust gains across three real-world urban datasets and shows scalability with larger LLM backbones, offering a practical foundation for resource-aware mobility forecasting with foundation models.

Abstract

We introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a framework that leverages large language models (LLMs) as spatio-temporal predictors and trajectory reasoners. RHYTHM partitions trajectories into daily segments encoded as discrete tokens with hierarchical attention, capturing both daily and weekly dependencies while substantially reducing the sequence length. Token representations are enriched with pre-computed prompt embeddings via a frozen LLM, enhancing the model's ability to capture interdependencies without extensive computational overhead. By freezing the LLM backbone, RHYTHM achieves significant computational efficiency. Evaluation on three real-world datasets demonstrates a 2.4% improvement in accuracy, 5.0% increase on weekends, and 24.6% reduction in training time compared to state-of-the-art methods.

Paper Structure

This paper contains 39 sections, 18 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Motivation for RHYTHM. By partitioning trajectories into discrete tokens instead of a continuous stream, RHYTHM more effectively captures recurring mobility patterns.
  • Figure 2: The proposed architecture of RHYTHM. Historical trajectories are first converted into spatio–temporal embeddings and discretized via temporal tokenization (b), enabling hierarchical attention to capture both local and global dynamics. Each segment token is enriched with semantic trajectory embeddings, while future time‐step tokens integrate task‐context descriptors (a). The resulting token sequence is fed into a frozen LLM backbone, and an output projection layer produces the final location predictions.
  • Figure 3: Temporal performance patterns of RHYTHM and baselines on Sapporo data showing weekly (left) and daily (right) variations. The results demonstrate systematic performance fluctuations across both diurnal and weekly cycles.
  • Figure 4: Computational efficiency versus predictive accuracy trade-offs for RHYTHM and baseline approaches on the Sapporo dataset.
  • Figure 5: Computational performance across different LLM backbones using identical experimental settings from \ref{['tab:scale']}.