Table of Contents
Fetching ...

TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models

Saipraneeth Devunuri, Lewis Lehe

TL;DR

TransitGPT presents a prompt-driven framework that lets users query GTFS data in natural language by having LLMs generate Python code, which is then executed on a server hosting the GTFS feed. The system emphasizes moderation, dynamic few-shot prompts, and a sandboxed execution environment, with results summarized by a human-friendly Summarizer. In a 100-task benchmark, GPT-4o and Claude-3.5-Sonnet show that TransitGPT+ (with dynamic examples and error handling) outperforms a zero-shot baseline in most categories, particularly for GPT-4o, while offering greater flexibility for data retrieval, computation, and visualization. The work demonstrates the potential to democratize transit data analysis, opensource accessibility, and suggests future extensions to realtime feeds, GBFS, and additional data standards, alongside validation workflows to improve reliability.

Abstract

This paper introduces a framework that leverages Large Language Models (LLMs) to answer natural language queries about General Transit Feed Specification (GTFS) data. The framework is implemented in a chatbot called TransitGPT with open-source code. TransitGPT works by guiding LLMs to generate Python code that extracts and manipulates GTFS data relevant to a query, which is then executed on a server where the GTFS feed is stored. It can accomplish a wide range of tasks, including data retrieval, calculations, and interactive visualizations, without requiring users to have extensive knowledge of GTFS or programming. The LLMs that produce the code are guided entirely by prompts, without fine-tuning or access to the actual GTFS feeds. We evaluate TransitGPT using GPT-4o and Claude-3.5-Sonnet LLMs on a benchmark dataset of 100 tasks, to demonstrate its effectiveness and versatility. The results show that TransitGPT can significantly enhance the accessibility and usability of transit data.

TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models

TL;DR

TransitGPT presents a prompt-driven framework that lets users query GTFS data in natural language by having LLMs generate Python code, which is then executed on a server hosting the GTFS feed. The system emphasizes moderation, dynamic few-shot prompts, and a sandboxed execution environment, with results summarized by a human-friendly Summarizer. In a 100-task benchmark, GPT-4o and Claude-3.5-Sonnet show that TransitGPT+ (with dynamic examples and error handling) outperforms a zero-shot baseline in most categories, particularly for GPT-4o, while offering greater flexibility for data retrieval, computation, and visualization. The work demonstrates the potential to democratize transit data analysis, opensource accessibility, and suggests future extensions to realtime feeds, GBFS, and additional data standards, alongside validation workflows to improve reliability.

Abstract

This paper introduces a framework that leverages Large Language Models (LLMs) to answer natural language queries about General Transit Feed Specification (GTFS) data. The framework is implemented in a chatbot called TransitGPT with open-source code. TransitGPT works by guiding LLMs to generate Python code that extracts and manipulates GTFS data relevant to a query, which is then executed on a server where the GTFS feed is stored. It can accomplish a wide range of tasks, including data retrieval, calculations, and interactive visualizations, without requiring users to have extensive knowledge of GTFS or programming. The LLMs that produce the code are guided entirely by prompts, without fine-tuning or access to the actual GTFS feeds. We evaluate TransitGPT using GPT-4o and Claude-3.5-Sonnet LLMs on a benchmark dataset of 100 tasks, to demonstrate its effectiveness and versatility. The results show that TransitGPT can significantly enhance the accessibility and usability of transit data.

Paper Structure

This paper contains 16 sections, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Demonstrations of 'TransitGPT' in generating responses for GTFS data retrieval tasks. Sample visualizations generated using TransitGPT are available in \ref{['sec:sampleVisualizations']}.
  • Figure 2: TransitGPT Interface
  • Figure 3: Extended TransitGPT Architecture
  • Figure 4: Moderation Prompt
  • Figure 5: Excerpts from Main prompt covering various modules. Each module is wrapped in an <XML> tag (in Red) to delimit.
  • ...and 7 more figures