Table of Contents
Fetching ...

Unraveling Pedestrian Fatality Patterns: A Comparative Study with Explainable AI

Methusela Sulle, Judith Mwakalonge, Gurcan Comert, Saidi Siuhi, Nana Kankam Gyimah

TL;DR

This study addresses pedestrian fatalities by comparing the top five and bottom five U.S. states (2018–2022) using FARS data and an Ensemble Learning Framework (ELF) that integrates SMOTE for data balance and SHAP for interpretable explanations. Among evaluated models, XGBoost delivers the strongest performance with a balanced accuracy of 98% and clear SHAP-driven insights, highlighting age, alcohol/drug impairment, crash location, and visibility conditions as key predictors. The approach identifies high-risk zones and provides state- and region-specific drivers to inform targeted interventions, such as improved lighting and pedestrian infrastructure, alongside enforcement enhancements. The findings offer a practical, explainable AI-based basis for policymakers and urban planners to reduce pedestrian fatalities and improve road safety while addressing data imbalance and interpretability challenges in high-stakes decision making.

Abstract

Road fatalities pose significant public safety and health challenges worldwide, with pedestrians being particularly vulnerable in vehicle-pedestrian crashes due to disparities in physical and performance characteristics. This study employs explainable artificial intelligence (XAI) to identify key factors contributing to pedestrian fatalities across the five U.S. states with the highest crash rates (2018-2022). It compares them to the five states with the lowest fatality rates. Using data from the Fatality Analysis Reporting System (FARS), the study applies machine learning techniques-including Decision Trees, Gradient Boosting Trees, Random Forests, and XGBoost-to predict contributing factors to pedestrian fatalities. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is utilized, while SHapley Additive Explanations (SHAP) values enhance model interpretability. The results indicate that age, alcohol and drug use, location, and environmental conditions are significant predictors of pedestrian fatalities. The XGBoost model outperformed others, achieving a balanced accuracy of 98 %, accuracy of 90 %, precision of 92 %, recall of 90 %, and an F1 score of 91 %. Findings reveal that pedestrian fatalities are more common in mid-block locations and areas with poor visibility, with older adults and substance-impaired individuals at higher risk. These insights can inform policymakers and urban planners in implementing targeted safety measures, such as improved lighting, enhanced pedestrian infrastructure, and stricter traffic law enforcement, to reduce fatalities and improve public safety.

Unraveling Pedestrian Fatality Patterns: A Comparative Study with Explainable AI

TL;DR

This study addresses pedestrian fatalities by comparing the top five and bottom five U.S. states (2018–2022) using FARS data and an Ensemble Learning Framework (ELF) that integrates SMOTE for data balance and SHAP for interpretable explanations. Among evaluated models, XGBoost delivers the strongest performance with a balanced accuracy of 98% and clear SHAP-driven insights, highlighting age, alcohol/drug impairment, crash location, and visibility conditions as key predictors. The approach identifies high-risk zones and provides state- and region-specific drivers to inform targeted interventions, such as improved lighting and pedestrian infrastructure, alongside enforcement enhancements. The findings offer a practical, explainable AI-based basis for policymakers and urban planners to reduce pedestrian fatalities and improve road safety while addressing data imbalance and interpretability challenges in high-stakes decision making.

Abstract

Road fatalities pose significant public safety and health challenges worldwide, with pedestrians being particularly vulnerable in vehicle-pedestrian crashes due to disparities in physical and performance characteristics. This study employs explainable artificial intelligence (XAI) to identify key factors contributing to pedestrian fatalities across the five U.S. states with the highest crash rates (2018-2022). It compares them to the five states with the lowest fatality rates. Using data from the Fatality Analysis Reporting System (FARS), the study applies machine learning techniques-including Decision Trees, Gradient Boosting Trees, Random Forests, and XGBoost-to predict contributing factors to pedestrian fatalities. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is utilized, while SHapley Additive Explanations (SHAP) values enhance model interpretability. The results indicate that age, alcohol and drug use, location, and environmental conditions are significant predictors of pedestrian fatalities. The XGBoost model outperformed others, achieving a balanced accuracy of 98 %, accuracy of 90 %, precision of 92 %, recall of 90 %, and an F1 score of 91 %. Findings reveal that pedestrian fatalities are more common in mid-block locations and areas with poor visibility, with older adults and substance-impaired individuals at higher risk. These insights can inform policymakers and urban planners in implementing targeted safety measures, such as improved lighting, enhanced pedestrian infrastructure, and stricter traffic law enforcement, to reduce fatalities and improve public safety.

Paper Structure

This paper contains 32 sections, 15 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Identification of high-risk pedestrian fatality zones
  • Figure 2: Feature analysis flow using Machine Learning methods
  • Figure 3:
  • Figure 4: