Table of Contents
Fetching ...

Urban Mobility Assessment Using LLMs

Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser

TL;DR

This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data and provides an argument for using such data in mobility studies.

Abstract

Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.

Urban Mobility Assessment Using LLMs

TL;DR

This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data and provides an argument for using such data in mobility studies.

Abstract

Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.
Paper Structure (13 sections, 11 figures, 13 tables)

This paper contains 13 sections, 11 figures, 13 tables.

Figures (11)

  • Figure 1: LLM-based travel survey generation system.
  • Figure 2: Distribution of the number of locations traveled by survey respondents for the San Francisco-Oakland-Hayward metropolitan area. Llama-2 and its fine-tuned variant show best alignment with actual survey data.
  • Figure 3: Travel time predictions - Gemini-Pro and GPT-4 models outperform Llama-2. The Llama-2-trained model matches survey travel times. Outliers are removed for better visualization.
  • Figure 4: Difference in first order destination probabilities (actual - generated) per location category. Llama-2-trained outperforms (with shorter bars) other models.
  • Figure 5: The cumulative sum of activity chain counts for the San Francisco-Oakland-Hayward metropolitan area's actual survey compared to the generated data. Llama-2-trained data closely matches the actual survey. Llama-2-trained and LA actual data are almost overlapping.
  • ...and 6 more figures