Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts
Ha Manh Bui, Anqi Liu
TL;DR
Density-Softmax presents a deterministic, test-time efficient framework that unifies a Lipschitz-constrained feature extractor with a latent-space density model and a softmax classifier to achieve minimax uncertainty control under distribution shifts. By training in three stages—1-Lipschitz feature learning, latent density estimation via Normalizing-Flows, and density-weighted classifier fine-tuning—the method achieves distance-aware predictions in a single forward pass, with theoretical guarantees of uniform OOD behavior and preservation of IID performance. Empirically, it delivers competitive robustness and uncertainty calibration on toy and large-scale benchmarks (e.g., CIFAR-10/100-C, ImageNet-C, CIFAR-10.1) while reducing test-time latency and parameter counts relative to ensembles, and ablations confirm the complementary roles of the Lipschitz constraint and the latent density. The approach holds practical value for real-time, low-resource deployment and opens avenues for applying density-based uncertainty in pre-trained models and more scalable density estimators.
Abstract
Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test-time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test-time.
