Table of Contents
Fetching ...

Non-Linear Determinants of Pedestrian Injury Severity: Evidence from Administrative Data in Great Britain

Yifei Tong

TL;DR

Pedestrian injury severity in Great Britain is investigated using 2023 STATS19 data to identify non-linear determinants and geographic disparities. The authors employ a preprocessing pipeline with mode imputation and SMOTE to address data quality and class imbalance, and apply Random Forest and XGBoost with SHAP for interpretability. Spatially linking collisions to Local Authority Districts reveals urban hotspots and rural districts with disproportionately severe outcomes, while model results highlight vehicle count, speed limits, lighting, and road surface as primary predictors, with police attendance and junction details adding discriminatory power. The study demonstrates a practical, spatially-informed ML framework to guide targeted speed management, infrastructure investment, and enforcement strategies for pedestrian safety.

Abstract

This study investigates the non-linear determinants of pedestrian injury severity using administrative data from Great Britain's 2023 STATS19 dataset. To address inherent data-quality challenges, including missing information and substantial class imbalance, we employ a rigorous preprocessing pipeline utilizing mode imputation and Synthetic Minority Over-sampling (SMOTE). We utilize non-parametric ensemble methods (Random Forest and XGBoost) to capture complex interactions and heterogeneity often missed by linear models, while Shapley Additive Explanations are employed to ensure interpretability and isolate marginal feature effects. Our analysis reveals that vehicle count, speed limits, lighting, and road surface conditions are the primary predictors of severity, with police attendance and junction characteristics further distinguishing severe collisions. Spatially, while pedestrian risk is concentrated in dense urban Local Authority Districts (LADs), we identify that certain rural LADs experience disproportionately severe outcomes conditional on a collision occurring. These findings underscore the value of combining spatial analysis with interpretable machine learning to guide geographically targeted speed management, infrastructure investment, and enforcement strategies.

Non-Linear Determinants of Pedestrian Injury Severity: Evidence from Administrative Data in Great Britain

TL;DR

Pedestrian injury severity in Great Britain is investigated using 2023 STATS19 data to identify non-linear determinants and geographic disparities. The authors employ a preprocessing pipeline with mode imputation and SMOTE to address data quality and class imbalance, and apply Random Forest and XGBoost with SHAP for interpretability. Spatially linking collisions to Local Authority Districts reveals urban hotspots and rural districts with disproportionately severe outcomes, while model results highlight vehicle count, speed limits, lighting, and road surface as primary predictors, with police attendance and junction details adding discriminatory power. The study demonstrates a practical, spatially-informed ML framework to guide targeted speed management, infrastructure investment, and enforcement strategies for pedestrian safety.

Abstract

This study investigates the non-linear determinants of pedestrian injury severity using administrative data from Great Britain's 2023 STATS19 dataset. To address inherent data-quality challenges, including missing information and substantial class imbalance, we employ a rigorous preprocessing pipeline utilizing mode imputation and Synthetic Minority Over-sampling (SMOTE). We utilize non-parametric ensemble methods (Random Forest and XGBoost) to capture complex interactions and heterogeneity often missed by linear models, while Shapley Additive Explanations are employed to ensure interpretability and isolate marginal feature effects. Our analysis reveals that vehicle count, speed limits, lighting, and road surface conditions are the primary predictors of severity, with police attendance and junction characteristics further distinguishing severe collisions. Spatially, while pedestrian risk is concentrated in dense urban Local Authority Districts (LADs), we identify that certain rural LADs experience disproportionately severe outcomes conditional on a collision occurring. These findings underscore the value of combining spatial analysis with interpretable machine learning to guide geographically targeted speed management, infrastructure investment, and enforcement strategies.

Paper Structure

This paper contains 1 section, 7 figures, 5 tables.

Table of Contents

  1. Appendix

Figures (7)

  • Figure 1: Distribution of Accident Severity Classes
  • Figure 2: Distribution of Number of Vehicles/Casualties
  • Figure 3: Distribution of Days of Week and Nearest Hour
  • Figure 4: Choropleth Map of Pedestrian Collision Share by Local Authority District
  • Figure 5: SHAP Plot of Random Forest Models
  • ...and 2 more figures