Table of Contents
Fetching ...

Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers

Vahid Hemmati, Ahmad Mohammadi, Abdul-Rauf Nuhu, Reza Ahmari, Parham Kebria, Abdollah Homaifar

TL;DR

Volatility in Certainty (VC) introduces a label-free metric to detect adversarial perturbations by measuring dispersion in sorted softmax confidences. It formalizes a per-sample certainty $\delta_i$, a local volatility $vc_k$, and an aggregate VC score, with the objective that $\text{Corr} \left( \log VC(h_w), \text{Acc}(h_w) \right) \to -1$, indicating that smoother confidence landscapes correspond to higher accuracy. Empirical results across MNIST and CIFAR-10 on ANN, CNN, and a Regularized VGG show a strong negative correlation ($\rho < -0.90$) between $\log(VC)$ and accuracy, even under FGSM perturbations, supporting VC as a scalable, real-time, label-free indicator of robustness and adversarial drift. The work highlights VC's potential for early-warning in edge scenarios and suggests avenues for VC-informed training and OOD generalization assessment without labeled validation data.

Abstract

Adversarial robustness remains a critical challenge in deploying neural network classifiers, particularly in real-time systems where ground-truth labels are unavailable during inference. This paper investigates \textit{Volatility in Certainty} (VC), a recently proposed, label-free metric that quantifies irregularities in model confidence by measuring the dispersion of sorted softmax outputs. Specifically, VC is defined as the average squared log-ratio of adjacent certainty values, capturing local fluctuations in model output smoothness. We evaluate VC as a proxy for classification accuracy and as an indicator of adversarial drift. Experiments are conducted on artificial neural networks (ANNs) and convolutional neural networks (CNNs) trained on MNIST, as well as a regularized VGG-like model trained on CIFAR-10. Adversarial examples are generated using the Fast Gradient Sign Method (FGSM) across varying perturbation magnitudes. In addition, mixed test sets are created by gradually introducing adversarial contamination to assess VC's sensitivity under incremental distribution shifts. Our results reveal a strong negative correlation between classification accuracy and log(VC) (correlation rho < -0.90 in most cases), suggesting that VC effectively reflects performance degradation without requiring labeled data. These findings position VC as a scalable, architecture-agnostic, and real-time performance metric suitable for early-warning systems in safety-critical applications.

Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers

TL;DR

Volatility in Certainty (VC) introduces a label-free metric to detect adversarial perturbations by measuring dispersion in sorted softmax confidences. It formalizes a per-sample certainty , a local volatility , and an aggregate VC score, with the objective that , indicating that smoother confidence landscapes correspond to higher accuracy. Empirical results across MNIST and CIFAR-10 on ANN, CNN, and a Regularized VGG show a strong negative correlation () between and accuracy, even under FGSM perturbations, supporting VC as a scalable, real-time, label-free indicator of robustness and adversarial drift. The work highlights VC's potential for early-warning in edge scenarios and suggests avenues for VC-informed training and OOD generalization assessment without labeled validation data.

Abstract

Adversarial robustness remains a critical challenge in deploying neural network classifiers, particularly in real-time systems where ground-truth labels are unavailable during inference. This paper investigates \textit{Volatility in Certainty} (VC), a recently proposed, label-free metric that quantifies irregularities in model confidence by measuring the dispersion of sorted softmax outputs. Specifically, VC is defined as the average squared log-ratio of adjacent certainty values, capturing local fluctuations in model output smoothness. We evaluate VC as a proxy for classification accuracy and as an indicator of adversarial drift. Experiments are conducted on artificial neural networks (ANNs) and convolutional neural networks (CNNs) trained on MNIST, as well as a regularized VGG-like model trained on CIFAR-10. Adversarial examples are generated using the Fast Gradient Sign Method (FGSM) across varying perturbation magnitudes. In addition, mixed test sets are created by gradually introducing adversarial contamination to assess VC's sensitivity under incremental distribution shifts. Our results reveal a strong negative correlation between classification accuracy and log(VC) (correlation rho < -0.90 in most cases), suggesting that VC effectively reflects performance degradation without requiring labeled data. These findings position VC as a scalable, architecture-agnostic, and real-time performance metric suitable for early-warning systems in safety-critical applications.

Paper Structure

This paper contains 14 sections, 6 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Inverse correlation between model accuracy and $\log(\text{VC})$ for an ANN on MNIST, indicating $\log(\text{VC})$ as a reliable marker of generalization loss.
  • Figure 2: Inverse correlation between model accuracy and $\log(\text{VC})$ for a CNN on MNIST, highlighting $\log(\text{VC})$ as a sensitive generalization indicator.
  • Figure 3: Inverse correlation between model accuracy and $\log(\text{VC})$ for a VGG network on CIFAR-10, confirming its effectiveness as a generalization indicator.
  • Figure 4: Accuracy vs. $\log(\text{VC})$ for ANN and CNN models on FGSM-perturbed MNIST data.
  • Figure 5: Accuracy vs. $\log(\text{VC})$ for VGG on FGSM-CIFAR-10 ($n=10000$, $\epsilon \in [0.000, 0.030]$). Strong inverse correlation ($\rho = -0.85$) reveals VGG’s greater vulnerability.
  • ...and 1 more figures