Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Ibne Farabi Shihab; Sanjeda Akter; Anuj Sharma

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR

A spectral diagnostic framework grounded in Random Matrix Theory and Heavy-Tailed Self-Regularization that spans the ML taxonomy that spans the ML taxonomy is introduced, and a strong rank correlation between $\alpha$ and expert agreement is observed, suggesting spectral quality captures model behaviors aligned with expert reasoning.

Abstract

Crash classification models in transportation safety are typically evaluated using accuracy, F1, or AUC, metrics that cannot reveal whether a model is silently overfitting. We introduce a spectral diagnostic framework grounded in Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) that spans the ML taxonomy: weight matrices for BERT/ALBERT/Qwen2.5, out-of-fold increment matrices for XGBoost/Random Forest, empirical Hessians for Logistic Regression, induced affinity matrices for Decision Trees, and Graph Laplacians for KNN. Evaluating nine model families on two Iowa DOT crash classification tasks (173,512 and 371,062 records respectively), we find that the power-law exponent $α$ provides a structural quality signal: well-regularized models consistently yield $α$ within $[2, 4]$ (mean $2.87 \pm 0.34$), while overfit variants show $α< 2$ or spectral collapse. We observe a strong rank correlation between $α$ and expert agreement (Spearman $ρ= 0.89$, $p < 0.001$), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an $α$-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

TL;DR

and expert agreement is observed, suggesting spectral quality captures model behaviors aligned with expert reasoning.

Abstract

provides a structural quality signal: well-regularized models consistently yield

within

(mean

), while overfit variants show

or spectral collapse. We observe a strong rank correlation between

and expert agreement (Spearman

), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an

-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.

Paper Structure (32 sections, 6 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 6 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Machine Learning for Crash Classification
Random Matrix Theory in Machine Learning
Theoretical Background
Empirical Spectral Density and Power-Law Fitting
Correlation Traps
Extension to Gradient-Boosted Trees and Ensembles
Proposed Effective Representation Matrices
Parametric Convex Models (Logistic Regression)
Partition Models (Decision Trees)
Instance-Based Models (K-Nearest Neighbors)
Methodology
Algorithmic Scalability
Crash Classification Tasks
...and 17 more sections

Figures (5)

Figure 1: Unified spectral diagnostic framework. Crash data trains a diverse taxonomy of models. Weight matrices (Deep Learning) or Effective Representation Matrices (Classical ML/Ensembles) are analyzed via RMT to extract $\alpha$ and trap counts, informing safety-critical deployment decisions.
Figure 2: Layer-wise $\alpha$ for BERT (top) and ALBERT (bottom). Shaded band = optimal range $[2,4]$; colored bands = min/max across 5 seeds. ALBERT's shared weights produce more uniform $\alpha$ with tighter variance.
Figure 3: Schematic ESD (log-log) of the XGBoost OOF correlation matrix. Left: well-regularized ($\alpha=2.34$, no traps). Right: overfit ($\alpha=1.62$, correlation traps visible as isolated spikes). MP distribution (dashed gray) serves as the null model. These are illustrative; actual $\alpha$ values are fitted via MLE on computed eigenvalues.
Figure 4: BERT training dynamics (median seed, INT task). Top: validation loss minimum at epoch 7. Bottom: $\hat{\alpha}$ crosses below 2.0 at epoch 5, providing an earlier structural warning.
Figure 5: Spectral quality ($\hat{\alpha}$) vs. expert agreement ($\kappa$) across all model families ($n=15$; DT/KNN-Overfit excluded due to collapsed spectra). Spearman $\rho = 0.89$ ($p < 0.001$; 95% bootstrap CI: $[0.74, 0.96]$). Shaded region = optimal $\alpha$ range. Qwen2.5-7B (zero-shot) occupies the upper-right quadrant.

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

TL;DR

Abstract

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)