Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis
Oscar Lares, Hao Zhen, Jidong J. Yang
TL;DR
The study tackles the problem of predicting traffic crash types with interpretable causal insights by fusing multisource data and introducing the Feature Group Tabular Transformer (FGTT). FGTT tokenizes semantically defined feature groups and processes them with a transformer encoder, achieving superior predictive performance and clear attention-based explanations compared to Random Forest, XGBoost, and CatBoost. Key contributions include assembling a comprehensive MV crash dataset (6810 instances, 33 features) and demonstrating that attention heatmaps and SHAP analyses illuminate the dominant role of event-specific details and driver interactions in crash typology. The practical impact lies in delivering a transparent, high-performing model that informs targeted safety interventions and policy decisions, while highlighting avenues for future enhancement such as dynamic grouping and richer data sources.
Abstract
Reliable and interpretable traffic crash modeling is essential for understanding causality and improving road safety. This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources, including weather data, crash reports, high-resolution traffic information, pavement geometry, and facility characteristics. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups, represented as tokens. These group-based tokens serve as rich semantic components, enabling effective identification of collision patterns and interpretation of causal mechanisms. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance. Furthermore, model interpretation reveals key influential factors, providing fresh insights into the underlying causality of distinct crash types.
