Table of Contents
Fetching ...

GPS-2-GTFS: A Python package to process and transform raw GPS data of public transit to GTFS format

Shiveswarran Ratneswaran, Uthayasanker Thayasivam, Sivakumar Thillaiambalam

TL;DR

This work addresses the problem of converting raw public transit GPS trajectories into GTFS, including real-time GTFS-RT, to enable broader use in monitoring and planning. It introduces gps2gtfs, a Python-based, modular pipeline that ingests AVL GPS data, preprocesses it, extracts trip trajectories, and matches them to transit stops to generate GTFS outputs, handling challenges such as high data volume and localization errors. The two core methodologies—transit trip extraction and transit stop matching—rely on geo-buffering and buffering strategies to robustly identify trip boundaries and stop events, all within an extensible software architecture comprising eight functional packages. The package is demonstrated on real-world data from Kandy, Sri Lanka, and is positioned as an open-source tool with potential for future visualizations, delay modeling, and real-time analytics, thereby facilitating improved ITS and public transit applications.

Abstract

The gps2gtfs package addresses a critical need for converting raw Global Positioning System (GPS) trajectory data from public transit vehicles into the widely used GTFS (General Transit Feed Specification) format. This transformation enables various software applications to efficiently utilize real-time transit data for purposes such as tracking, scheduling, and arrival time prediction. Developed in Python, gps2gtfs employs techniques like geo-buffer mapping, parallel processing, and data filtering to manage challenges associated with raw GPS data, including high volume, discontinuities, and localization errors. This open-source package, available on GitHub and PyPI, enhances the development of intelligent transportation solutions and fosters improved public transit systems globally.

GPS-2-GTFS: A Python package to process and transform raw GPS data of public transit to GTFS format

TL;DR

This work addresses the problem of converting raw public transit GPS trajectories into GTFS, including real-time GTFS-RT, to enable broader use in monitoring and planning. It introduces gps2gtfs, a Python-based, modular pipeline that ingests AVL GPS data, preprocesses it, extracts trip trajectories, and matches them to transit stops to generate GTFS outputs, handling challenges such as high data volume and localization errors. The two core methodologies—transit trip extraction and transit stop matching—rely on geo-buffering and buffering strategies to robustly identify trip boundaries and stop events, all within an extensible software architecture comprising eight functional packages. The package is demonstrated on real-world data from Kandy, Sri Lanka, and is positioned as an open-source tool with potential for future visualizations, delay modeling, and real-time analytics, thereby facilitating improved ITS and public transit applications.

Abstract

The gps2gtfs package addresses a critical need for converting raw Global Positioning System (GPS) trajectory data from public transit vehicles into the widely used GTFS (General Transit Feed Specification) format. This transformation enables various software applications to efficiently utilize real-time transit data for purposes such as tracking, scheduling, and arrival time prediction. Developed in Python, gps2gtfs employs techniques like geo-buffer mapping, parallel processing, and data filtering to manage challenges associated with raw GPS data, including high volume, discontinuities, and localization errors. This open-source package, available on GitHub and PyPI, enhances the development of intelligent transportation solutions and fosters improved public transit systems globally.

Paper Structure

This paper contains 10 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: High-level flowchart of the methodology of the 'gps2gtfs' package which delivers GTFS information of trip travel times, stop dwell times, and segment run times of each transit trip
  • Figure 2: The pseudo-code of the bus trip extraction algorithm which outputs the trip start time and end time along with trip GPS trajectory sequences using the raw GPS data and location metadata of terminals as input
  • Figure 3: Three potential various scenarios when capturing GPS records within the buffer area centered around localized bus stops to find out timestamps corresponding to arrival and departure of bus
  • Figure 4: Software architecture diagram of the 'gps2gtfs' package covering all the major functionalities of the software
  • Figure 5: An illustration visualized by the 'visualization - folium' module from the software package 'gps2gtfs' showing the GPS trajectory points collected for a single trip over its route and the buffer around bus stops to filter out corresponding data point for bus arrival and departure