GPS-2-GTFS: A Python package to process and transform raw GPS data of public transit to GTFS format
Shiveswarran Ratneswaran, Uthayasanker Thayasivam, Sivakumar Thillaiambalam
TL;DR
This work addresses the problem of converting raw public transit GPS trajectories into GTFS, including real-time GTFS-RT, to enable broader use in monitoring and planning. It introduces gps2gtfs, a Python-based, modular pipeline that ingests AVL GPS data, preprocesses it, extracts trip trajectories, and matches them to transit stops to generate GTFS outputs, handling challenges such as high data volume and localization errors. The two core methodologies—transit trip extraction and transit stop matching—rely on geo-buffering and buffering strategies to robustly identify trip boundaries and stop events, all within an extensible software architecture comprising eight functional packages. The package is demonstrated on real-world data from Kandy, Sri Lanka, and is positioned as an open-source tool with potential for future visualizations, delay modeling, and real-time analytics, thereby facilitating improved ITS and public transit applications.
Abstract
The gps2gtfs package addresses a critical need for converting raw Global Positioning System (GPS) trajectory data from public transit vehicles into the widely used GTFS (General Transit Feed Specification) format. This transformation enables various software applications to efficiently utilize real-time transit data for purposes such as tracking, scheduling, and arrival time prediction. Developed in Python, gps2gtfs employs techniques like geo-buffer mapping, parallel processing, and data filtering to manage challenges associated with raw GPS data, including high volume, discontinuities, and localization errors. This open-source package, available on GitHub and PyPI, enhances the development of intelligent transportation solutions and fosters improved public transit systems globally.
