Table of Contents
Fetching ...

Predicting the duration of traffic incidents for Sydney greater metropolitan area using machine learning methods

Artur Grigorev, Sajjad Shafiei, Hanna Grzybowska, Adriana-Simona Mihaita

TL;DR

This study tackles predicting traffic incident duration and classifying incidents in the Sydney Greater Metropolitan Area into short-term or long-term categories using multiple ML models, including Gradient Boosted Trees, Random Forest, LightGBM, and XGBoost. Leveraging a rich dataset with 82 variables spanning incident details, road network metrics, and socio-economic indicators, the authors show that XGBoost and LightGBM deliver superior performance, achieving a regression RMSE of $33.7$ and a classification F1 score of $0.62$ at a $30$-minute threshold. Feature attribution via tree splits and SHAP identifies key drivers such as the number of affected lanes, traffic volume, and vehicle types, providing actionable insights for traffic management. The results hold practical value for resource allocation and incident response planning, and the authors offer a public code link for reproducibility and further enhancements, including spatial-temporal analyses and survival-model approaches.

Abstract

This research presents a comprehensive approach to predicting the duration of traffic incidents and classifying them as short-term or long-term across the Sydney Metropolitan Area. Leveraging a dataset that encompasses detailed records of traffic incidents, road network characteristics, and socio-economic indicators, we train and evaluate a variety of advanced machine learning models including Gradient Boosted Decision Trees (GBDT), Random Forest, LightGBM, and XGBoost. The models are assessed using Root Mean Square Error (RMSE) for regression tasks and F1 score for classification tasks. Our experimental results demonstrate that XGBoost and LightGBM outperform conventional models with XGBoost achieving the lowest RMSE of 33.7 for predicting incident duration and highest classification F1 score of 0.62 for a 30-minute duration threshold. For classification, the 30-minute threshold balances performance with 70.84% short-term duration classification accuracy and 62.72% long-term duration classification accuracy. Feature importance analysis, employing both tree split counts and SHAP values, identifies the number of affected lanes, traffic volume, and types of primary and secondary vehicles as the most influential features. The proposed methodology not only achieves high predictive accuracy but also provides stakeholders with vital insights into factors contributing to incident durations. These insights enable more informed decision-making for traffic management and response strategies. The code is available by the link: https://github.com/Future-Mobility-Lab/SydneyIncidents

Predicting the duration of traffic incidents for Sydney greater metropolitan area using machine learning methods

TL;DR

This study tackles predicting traffic incident duration and classifying incidents in the Sydney Greater Metropolitan Area into short-term or long-term categories using multiple ML models, including Gradient Boosted Trees, Random Forest, LightGBM, and XGBoost. Leveraging a rich dataset with 82 variables spanning incident details, road network metrics, and socio-economic indicators, the authors show that XGBoost and LightGBM deliver superior performance, achieving a regression RMSE of and a classification F1 score of at a -minute threshold. Feature attribution via tree splits and SHAP identifies key drivers such as the number of affected lanes, traffic volume, and vehicle types, providing actionable insights for traffic management. The results hold practical value for resource allocation and incident response planning, and the authors offer a public code link for reproducibility and further enhancements, including spatial-temporal analyses and survival-model approaches.

Abstract

This research presents a comprehensive approach to predicting the duration of traffic incidents and classifying them as short-term or long-term across the Sydney Metropolitan Area. Leveraging a dataset that encompasses detailed records of traffic incidents, road network characteristics, and socio-economic indicators, we train and evaluate a variety of advanced machine learning models including Gradient Boosted Decision Trees (GBDT), Random Forest, LightGBM, and XGBoost. The models are assessed using Root Mean Square Error (RMSE) for regression tasks and F1 score for classification tasks. Our experimental results demonstrate that XGBoost and LightGBM outperform conventional models with XGBoost achieving the lowest RMSE of 33.7 for predicting incident duration and highest classification F1 score of 0.62 for a 30-minute duration threshold. For classification, the 30-minute threshold balances performance with 70.84% short-term duration classification accuracy and 62.72% long-term duration classification accuracy. Feature importance analysis, employing both tree split counts and SHAP values, identifies the number of affected lanes, traffic volume, and types of primary and secondary vehicles as the most influential features. The proposed methodology not only achieves high predictive accuracy but also provides stakeholders with vital insights into factors contributing to incident durations. These insights enable more informed decision-making for traffic management and response strategies. The code is available by the link: https://github.com/Future-Mobility-Lab/SydneyIncidents

Paper Structure

This paper contains 8 sections, 9 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Heatmap of traffic incidents with marked centroids. Source: OpenStreetMap
  • Figure 2: Incident duration statistics
  • Figure 3: Density Plots of Duration by Main Category and Primary Vehicle
  • Figure 4: Density Plots of Duration by Secondary Vehicle and Is Major Incident
  • Figure 5: Density Plots of Duration by Closure Type and Direction
  • ...and 12 more figures