Table of Contents
Fetching ...

Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration

Juan D. Caicedo, Carlos Guirado, Marta C. González, Joan L. Walker

TL;DR

This paper argues that travel demand research suffers from reproducibility and benchmarking gaps, limiting policy relevance. It introduces an open-source benchmarking platform that implements five common ridership-prediction methods (ARIMA, SARIMA, MLP, CNN, LSTM) and evaluates them under stable, COVID-19, and protest conditions using Bogotá's five-year, daily transit data, with standardized preprocessing to enable fair comparisons. The results show that online, multi-output models—particularly LSTM—adapt best to dynamic disruptions, achieving MAAPE around $0.12$ during COVID-19, while stability favors single-output online LSTM (MAAPE ≈ $0.08$). The study demonstrates the value of open data/code for rapid, policy-ready insights and calls for a broader shift toward collaborative, reproducible travel demand research that can quickly translate into operational guidance.

Abstract

This research foregrounds general practices in travel demand research, emphasizing the need to change our ways. A critical barrier preventing travel demand literature from effectively informing policy is the volume of publications without clear, consolidated benchmarks, making it difficult for researchers and policymakers to gather insights and use models to guide decision-making. By emphasizing reproducibility and open collaboration, we aim to enhance the reliability and policy relevance of travel demand research. We present a collaborative infrastructure for transit demand prediction models, focusing on their performance during highly dynamic conditions like the COVID-19 pandemic. Drawing from over 300 published papers, we develop an open-source infrastructure with five common methodologies and assess their performance under stable and dynamic conditions. We found that the prediction error for the LSTM deep learning approach stabilized at a mean arctangent absolute percentage error (MAAPE) of about 0.12 within 1.5 months, whereas other models continued to exhibit higher error rates even a year into the pandemic. If research practices had prioritized reproducibility before the COVID-19 pandemic, transit agencies would have had clearer guidance on the best forecasting methods and quickly identified those best suited for pandemic conditions to inform operations in response to changes in transit demand. The aim of this open-source codebase is to lower the barrier for other researchers to replicate, reproduce models and build upon findings. We encourage researchers to test their own modeling approaches on this benchmarking platform, challenge the analyses conducted in this paper, and develop model specifications that can outperform those evaluated here. Further, collaborative research approaches must be expanded across travel demand modeling if we wish to impact policy and planning.

Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration

TL;DR

This paper argues that travel demand research suffers from reproducibility and benchmarking gaps, limiting policy relevance. It introduces an open-source benchmarking platform that implements five common ridership-prediction methods (ARIMA, SARIMA, MLP, CNN, LSTM) and evaluates them under stable, COVID-19, and protest conditions using Bogotá's five-year, daily transit data, with standardized preprocessing to enable fair comparisons. The results show that online, multi-output models—particularly LSTM—adapt best to dynamic disruptions, achieving MAAPE around during COVID-19, while stability favors single-output online LSTM (MAAPE ≈ ). The study demonstrates the value of open data/code for rapid, policy-ready insights and calls for a broader shift toward collaborative, reproducible travel demand research that can quickly translate into operational guidance.

Abstract

This research foregrounds general practices in travel demand research, emphasizing the need to change our ways. A critical barrier preventing travel demand literature from effectively informing policy is the volume of publications without clear, consolidated benchmarks, making it difficult for researchers and policymakers to gather insights and use models to guide decision-making. By emphasizing reproducibility and open collaboration, we aim to enhance the reliability and policy relevance of travel demand research. We present a collaborative infrastructure for transit demand prediction models, focusing on their performance during highly dynamic conditions like the COVID-19 pandemic. Drawing from over 300 published papers, we develop an open-source infrastructure with five common methodologies and assess their performance under stable and dynamic conditions. We found that the prediction error for the LSTM deep learning approach stabilized at a mean arctangent absolute percentage error (MAAPE) of about 0.12 within 1.5 months, whereas other models continued to exhibit higher error rates even a year into the pandemic. If research practices had prioritized reproducibility before the COVID-19 pandemic, transit agencies would have had clearer guidance on the best forecasting methods and quickly identified those best suited for pandemic conditions to inform operations in response to changes in transit demand. The aim of this open-source codebase is to lower the barrier for other researchers to replicate, reproduce models and build upon findings. We encourage researchers to test their own modeling approaches on this benchmarking platform, challenge the analyses conducted in this paper, and develop model specifications that can outperform those evaluated here. Further, collaborative research approaches must be expanded across travel demand modeling if we wish to impact policy and planning.
Paper Structure (17 sections, 2 equations, 8 figures, 7 tables)

This paper contains 17 sections, 2 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: BRT Daily Aggregated Demand From August 2015 to May 2021. Training Period: August 2015 to July 2018. Test Period: August 2018 to May 2021. Protest: November and December 2019. COVID-19: March 2020 to May 2021
  • Figure 2: Daily System-Wide Mean Arctangent Absolute Percentage Error Evolution for the Testing Period . (A) Single-output and static training, (B) Multi-output and static training, (C) Single-output and online training, and (D) Multi-output and online training.
  • Figure 3: Mean Arctangent Absolute Percentage Error in Stable Conditions
  • Figure 4: Mean Arctangent Absolute Percentage Error during COVID-19 condition
  • Figure 5: Mean Arctangent Absolute Percentage Error during protest condition
  • ...and 3 more figures