Table of Contents
Fetching ...

Forecasting MBTA Transit Dynamics: A Performance Benchmarking of Statistical and Machine Learning Models

Sai Siddharth Nalamalpu, Kaining Yuan, Aiden Zhou, Eugene Pinsky

TL;DR

This study benchmarks a broad set of statistical and machine-learning approaches to forecast MBTA subway ridership (gated station entries) and delays, emphasizing calendar features over weather. It systematically tests 11 models across multiple covariate combinations with bootstrap RMSE and SHAP analyses, and includes a novel Hawkes self-exciting point process to model delay events. Key findings show day-of-week and seasonality provide stronger predictive signals than weather, with Random Forest, Gradient Boosting, and MLPs delivering top-day-ahead performance; Hawkes offers calibrated next-event forecasts but is less effective for daily counts. The work informs transit planning by identifying robust predictors and illustrating how different modeling paradigms contribute to reliability and passenger information, while outlining avenues for higher-resolution data and spatially-informed extensions.

Abstract

The Massachusetts Bay Transportation Authority (MBTA) is the main public transit provider in Boston, operating multiple means of transport, including trains, subways, and buses. However, the system often faces delays and fluctuations in ridership volume, which negatively affect efficiency and passenger satisfaction. To further understand this phenomenon, this paper compares the performance of existing and unique methods to determine the best approach in predicting gated station entries in the subway system (a proxy for subway usage) and the number of delays in the overall MBTA system. To do so, this research considers factors that tend to affect public transportation, such as day of week, season, pressure, wind speed, average temperature, and precipitation. This paper evaluates the performance of 10 statistical and machine learning models on predicting next-day subway usage. On predicting delay count, the number of models is extended to 11 per day by introducing a self-exciting point process model, representing a unique application of a point-process framework for MBTA delay modeling. This research involves experimenting with the selective inclusion of features to determine feature importance, testing model accuracy via Root Mean Squared Error (RMSE). Remarkably, it is found that providing either day of week or season data has a more substantial benefit to predictive accuracy compared to weather data; in fact, providing weather data generally worsens performance, suggesting a tendency of models to overfit.

Forecasting MBTA Transit Dynamics: A Performance Benchmarking of Statistical and Machine Learning Models

TL;DR

This study benchmarks a broad set of statistical and machine-learning approaches to forecast MBTA subway ridership (gated station entries) and delays, emphasizing calendar features over weather. It systematically tests 11 models across multiple covariate combinations with bootstrap RMSE and SHAP analyses, and includes a novel Hawkes self-exciting point process to model delay events. Key findings show day-of-week and seasonality provide stronger predictive signals than weather, with Random Forest, Gradient Boosting, and MLPs delivering top-day-ahead performance; Hawkes offers calibrated next-event forecasts but is less effective for daily counts. The work informs transit planning by identifying robust predictors and illustrating how different modeling paradigms contribute to reliability and passenger information, while outlining avenues for higher-resolution data and spatially-informed extensions.

Abstract

The Massachusetts Bay Transportation Authority (MBTA) is the main public transit provider in Boston, operating multiple means of transport, including trains, subways, and buses. However, the system often faces delays and fluctuations in ridership volume, which negatively affect efficiency and passenger satisfaction. To further understand this phenomenon, this paper compares the performance of existing and unique methods to determine the best approach in predicting gated station entries in the subway system (a proxy for subway usage) and the number of delays in the overall MBTA system. To do so, this research considers factors that tend to affect public transportation, such as day of week, season, pressure, wind speed, average temperature, and precipitation. This paper evaluates the performance of 10 statistical and machine learning models on predicting next-day subway usage. On predicting delay count, the number of models is extended to 11 per day by introducing a self-exciting point process model, representing a unique application of a point-process framework for MBTA delay modeling. This research involves experimenting with the selective inclusion of features to determine feature importance, testing model accuracy via Root Mean Squared Error (RMSE). Remarkably, it is found that providing either day of week or season data has a more substantial benefit to predictive accuracy compared to weather data; in fact, providing weather data generally worsens performance, suggesting a tendency of models to overfit.

Paper Structure

This paper contains 9 sections, 3 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Histogram of Daily Delay Aggregates. The histogram was created using a bin size of 24, as calculated by Rice’s Rule. The distribution of total delays per day approximates a gamma distribution, which makes it right-skewed. The maximum number of delays on a given day is 337, while the minimum is 1 delay. Figure 1 plots the empirical distribution of total daily delay events (n = 24) and the maximum-likelihood Gamma density that best fits these data. The histogram shows a right-skewed pattern. Most days experience roughly 50 to 170 delays, while there are a relatively smaller number of days with over 250 delays. The fitted three-parameter Gamma distribution (shape = 3.92, scale = 32.03, location = -11.6) highlights that days with extreme aggregate delay counts, though infrequent, are still plausible. The close alignment between the curve and the histogram supports the fact that a gamma distribution is appropriate. On any given day, it is expected that there would be few delays as delays generally occur due to rare instances such as mechanical breakdown, weather, and construction. These rare instances do occur, however, as highlighted by the tail of the graph.
  • Figure 2: Histogram of Gated Station Entries The histogram was created using a bin size of 32, which was calculated using Rice’s Rule. The distribution of Gated Station Entries follows a bimodal distribution, with two normally distributed peaks. The maximum number of gated station entries on a given day is 606,176, while the minimum was 342. The best fit curve over the histogram visually represents that, during weekdays, more people use the MBTA (e.g., to go to work/school), as represented by the first peak at 200,000–300,000 entries. The other peak at 500,000 gated station entries could be due to increased travel on holidays such as Thanksgiving, Christmas, and Independence Day.
  • Figure 3: Bar Graph of Model Performance on Delay Data The bar graph shows that 7 out of 10 models predicted significantly more accurately given additional data, including all three highest-performing models. Additionally, the bar graph highlights how random forest regression, multilayer perceptrons, and gradient boost regression appear most suitable for delay prediction, ranking the highest of all models.
  • Figure 4: Bar Graph of Model Performance on GSE The bar graph shows 3 of 10 models predicted significantly more accurately given additional data – all three highest-performing models, which are random forest regression, multilayer perceptrons, and gradient boost regression.
  • Figure 5: Bar Graph of Delay Model Improvement Given Additional Data Out of all types of data, the figure shows day of week data leads to the greatest decrease in error within delay prediction models while weather data creates the greatest increase in error.
  • ...and 4 more figures