Table of Contents
Fetching ...

Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Siyu Li, Toan Tran, Haowen Lin, John Krumm, Cyrus Shahabi, Lingyi Zhao, Khurram Shafique, Li Xiong

TL;DR

Geo-Llama presents a novel LLM-fine-tuning approach for realistic synthetic human mobility trajectories under explicit spatiotemporal constraints. By encoding trajectories as textual sequences and applying a visit-wise temporal permutation, it learns true movement patterns independent of visit order and supports both unconstrained and constraint-driven generation with in-context prompts. Empirical results on GeoLife and MobilitySyn show Geo-Llama achieving superior realism across multiple metrics, improved data efficiency, and robust performance under multiple visits constraints relative to strong baselines, including Geo-CETRA. The method offers a practical, privacy-preserving means to generate contextually coherent trajectories for transportation planning, urban analysis, and epidemiology, with potential extensions to richer auxiliary data and domain adaptation.

Abstract

Generating realistic human mobility data is essential for various application domains, including transportation, urban planning, and epidemic control, as real data is often inaccessible to researchers due to high costs and privacy concerns. Existing deep generative models learn from real trajectories to generate synthetic ones. Despite the progress, most of them suffer from training stability issues and scale poorly with increasing data size. More importantly, they often lack control mechanisms to guide the generated trajectories under constraints such as enforcing specific visits. To address these limitations, we formally define the controlled trajectory generation problem for effectively handling multiple spatiotemporal constraints. We introduce Geo-Llama, a novel LLM finetuning framework that can enforce multiple explicit visit constraints while maintaining contextual coherence of the generated trajectories. In this approach, pre-trained LLMs are fine-tuned on trajectory data with a visit-wise permutation strategy where each visit corresponds to a specific time and location. This strategy enables the model to capture spatiotemporal patterns regardless of visit orders while maintaining flexible and in-context constraint integration through prompts during generation. Extensive experiments on real-world and synthetic datasets validate the effectiveness of Geo-Llama, demonstrating its versatility and robustness in handling a broad range of constraints to generate more realistic trajectories compared to existing methods.

Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

TL;DR

Geo-Llama presents a novel LLM-fine-tuning approach for realistic synthetic human mobility trajectories under explicit spatiotemporal constraints. By encoding trajectories as textual sequences and applying a visit-wise temporal permutation, it learns true movement patterns independent of visit order and supports both unconstrained and constraint-driven generation with in-context prompts. Empirical results on GeoLife and MobilitySyn show Geo-Llama achieving superior realism across multiple metrics, improved data efficiency, and robust performance under multiple visits constraints relative to strong baselines, including Geo-CETRA. The method offers a practical, privacy-preserving means to generate contextually coherent trajectories for transportation planning, urban analysis, and epidemiology, with potential extensions to richer auxiliary data and domain adaptation.

Abstract

Generating realistic human mobility data is essential for various application domains, including transportation, urban planning, and epidemic control, as real data is often inaccessible to researchers due to high costs and privacy concerns. Existing deep generative models learn from real trajectories to generate synthetic ones. Despite the progress, most of them suffer from training stability issues and scale poorly with increasing data size. More importantly, they often lack control mechanisms to guide the generated trajectories under constraints such as enforcing specific visits. To address these limitations, we formally define the controlled trajectory generation problem for effectively handling multiple spatiotemporal constraints. We introduce Geo-Llama, a novel LLM finetuning framework that can enforce multiple explicit visit constraints while maintaining contextual coherence of the generated trajectories. In this approach, pre-trained LLMs are fine-tuned on trajectory data with a visit-wise permutation strategy where each visit corresponds to a specific time and location. This strategy enables the model to capture spatiotemporal patterns regardless of visit orders while maintaining flexible and in-context constraint integration through prompts during generation. Extensive experiments on real-world and synthetic datasets validate the effectiveness of Geo-Llama, demonstrating its versatility and robustness in handling a broad range of constraints to generate more realistic trajectories compared to existing methods.
Paper Structure (16 sections, 4 equations, 8 figures, 4 tables)

This paper contains 16 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Fine tuning mechanism of Geo-Llama. First, the trajectories are converted into text strings through Textual Encoding. Next, the Temporal-Order Permutation permutes the textual data. The permuted data is then used for fine tuning.
  • Figure 2: The generation mechanism of Geo-Llama. It supports both uncontrolled and controlled generation. It generates responses based on the input prompts which are reordered based on the temporal order of visits' arrival time to achieve the final trajectories.
  • Figure 3: Data-efficient learning study. We crop some extremely high values of VAE for better visualization.
  • Figure 4: Impact of temporal-order permutation. The reported performance metrics represent average values across both datasets. For each dataset, the values are scaled to a range of 0 to 1 by dividing them by their respective maximum values.
  • Figure 5: Impact of constraint visits. The reported performance metrics represent average values across both datasets, with each metric scaled by its maximum value within each dataset.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5