Table of Contents
Fetching ...

Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

Ha Manh Bui, Anqi Liu

TL;DR

Density-Softmax presents a deterministic, test-time efficient framework that unifies a Lipschitz-constrained feature extractor with a latent-space density model and a softmax classifier to achieve minimax uncertainty control under distribution shifts. By training in three stages—1-Lipschitz feature learning, latent density estimation via Normalizing-Flows, and density-weighted classifier fine-tuning—the method achieves distance-aware predictions in a single forward pass, with theoretical guarantees of uniform OOD behavior and preservation of IID performance. Empirically, it delivers competitive robustness and uncertainty calibration on toy and large-scale benchmarks (e.g., CIFAR-10/100-C, ImageNet-C, CIFAR-10.1) while reducing test-time latency and parameter counts relative to ensembles, and ablations confirm the complementary roles of the Lipschitz constraint and the latent density. The approach holds practical value for real-time, low-resource deployment and opens avenues for applying density-based uncertainty in pre-trained models and more scalable density estimators.

Abstract

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test-time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test-time.

Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

TL;DR

Density-Softmax presents a deterministic, test-time efficient framework that unifies a Lipschitz-constrained feature extractor with a latent-space density model and a softmax classifier to achieve minimax uncertainty control under distribution shifts. By training in three stages—1-Lipschitz feature learning, latent density estimation via Normalizing-Flows, and density-weighted classifier fine-tuning—the method achieves distance-aware predictions in a single forward pass, with theoretical guarantees of uniform OOD behavior and preservation of IID performance. Empirically, it delivers competitive robustness and uncertainty calibration on toy and large-scale benchmarks (e.g., CIFAR-10/100-C, ImageNet-C, CIFAR-10.1) while reducing test-time latency and parameter counts relative to ensembles, and ablations confirm the complementary roles of the Lipschitz constraint and the latent density. The approach holds practical value for real-time, low-resource deployment and opens avenues for applying density-based uncertainty in pre-trained models and more scalable density estimators.

Abstract

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test-time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test-time.
Paper Structure (30 sections, 7 theorems, 48 equations, 19 figures, 10 tables, 2 algorithms)

This paper contains 30 sections, 7 theorems, 48 equations, 19 figures, 10 tables, 2 algorithms.

Key Result

Theorem 3.1

federer2014geometric If $f: \mathcal{X} \rightarrow \mathcal{Z}$ is a locally Lipschitz continuous function, then $f$ is differentiable almost everywhere. Moreover, if $f$ is Lipschitz continuous, then $L(f)=\sup_{x\in \mathbb{R}^n}||\nabla_x f(x)||_2,$ where $L(f)$ is the Lipschitz constant of $f$.

Figures (19)

  • Figure 1: The class probability $p(y|x)$ (Top Row) and predictive uncertainty $var(y|x) = p(y|x) \cdot (1 - p(y|x))$ surface (Bottom Row) as background colors in a comparison between our Density-Softmax and different approaches on the two moons 2D classification. Training data for positive (Orange) and negative classes (Blue) are shown. OOD data (Red) are not observed during training. Our Density-Softmax achieves distance awareness with a uniform class probability and high uncertainty value on OOD data. A quick demo is available at https://colab.research.google.com/drive/1dqaacHzOHUPFhBDcUw7yGL-zv7GSG-8P?usp=sharing.
  • Figure 2: The overall architectures of Density-Softmax, including an encoder $f$, a classifer $g$, and a density function $p(Z;\alpha)$. The rectangle boxes represent these functions. The circle with two cross lines represents the softmax layer. The 3 training steps and inference process follow Algorithm \ref{['alg:algorithm']}.
  • Figure 3: Reliability diagram of Density-Softmax v.s. different approaches, trained on CIFAR-10, test on CIFAR-10.1 (v6). Density-Softmax is better-calibrated than others. Details are in Apd. \ref{['apd:results_calib']}.
  • Figure 4: (a) PDF plot of predictive entropy $\mathrm{H}(p(y|x))$ for the semantic shift. Density-Softmax provides the highest entropy with high density for OOD; (b) Inference cost comparison at test-time on ImageNet. Our model consistently outperforms SOTA across modern GPU architectures; (c) Histogram of $p(z;\alpha)$'s likelihood. Blue represents on CIFAR-10 train, Orange is IID test, Green, Red, Purple, Brown, Pink are OOD from 1-5 shift levels on CIFAR-10-C. It produces high values on IID and lower values on OOD w.r.t. intensity levels.
  • Figure 5: Feature visualizations comparison between models with & without 1-Lipschitz constraint on CIFAR-10-C (a & b), reliability diagrams between models with & without the density-function on CIFAR-10 (c & d).
  • ...and 14 more figures

Theorems & Definitions (19)

  • Theorem 3.1
  • Remark 3.2
  • Remark 3.4
  • Remark 3.5
  • Definition 4.1
  • Definition 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Theorem 4.6
  • ...and 9 more