Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning
William Hao-Cheng Huang
TL;DR
This work identifies a finite-sample, necessary spectral boundary for identifiability and stability in high-dimensional learning. The central result, the Fisher spectral threshold, shows that identifiability requires the bottom Fisher eigenvalue $\lambda_{\min}(\Gamma)$ to exceed a finite-sample fluctuation floor $2\Lambda^*$, yielding a sharp phase transition and enabling linear convergence when exceeded. It introduces the Constructive Fisher Floor, a practical regularizer that enforces a minimal spectral level, and extends the theory to stochastic training with smoothing, plus robustness under preconditioning. Synthetic experiments on Gaussian mixtures and logistic models verify the $d/n$ scaling and demonstrate how smoothing, regularization, and finite-direction monitoring can diagnose and enforce stability. Overall, the paper reframes classical eigenvalue conditions into a non-asymptotic spectral law, bridging statistical identifiability with learning-theoretic stability and providing actionable tools for robust high-dimensional inference.
Abstract
In high-dimensional learning, models remain stable until they collapse abruptly once the sample size falls below a critical level. This instability is not algorithm-specific but a geometric mechanism: when the weakest Fisher eigendirection falls beneath sample-level fluctuations, identifiability fails. Our Fisher Threshold Theorem formalizes this by proving that stability requires the minimal Fisher eigenvalue to exceed an explicit $O(\sqrt{d/n})$ bound. Unlike prior asymptotic or model-specific criteria, this threshold is finite-sample and necessary, marking a sharp phase transition between reliable concentration and inevitable failure. To make the principle constructive, we introduce the Fisher floor, a verifiable spectral regularization robust to smoothing and preconditioning. Synthetic experiments on Gaussian mixtures and logistic models confirm the predicted transition, consistent with $d/n$ scaling. Statistically, the threshold sharpens classical eigenvalue conditions into a non-asymptotic law; learning-theoretically, it defines a spectral sample-complexity frontier, bridging theory with diagnostics for robust high-dimensional inference.
