Table of Contents
Fetching ...

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang, Jun Zhu, Deyu Meng

Abstract

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Abstract

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.
Paper Structure (51 sections, 8 theorems, 38 equations, 3 figures, 5 tables)

This paper contains 51 sections, 8 theorems, 38 equations, 3 figures, 5 tables.

Key Result

Theorem 3.1

For the canonical pair $(\hat{x}_u,\hat{p}_u)$ and the state $\psi_c$ with finite second moments, where $\kappa=\tfrac{1}{2}|\langle[\hat{x}_u,\hat{p}_u]\rangle_c|=\tfrac{1}{2}$.

Figures (3)

  • Figure 1: Illustration of the Neural Uncertainty Principle (NUP) in the $(\Delta \hat{m}_u^\star,\Delta \hat{p}_u)$ plane. Here $\Delta \hat{m}_u^\star$ is the minimum dispersion achievable by a mixed observable $\hat{m}_u(\lambda)=\hat{x}_u+\lambda \hat{p}_u$, and $\Delta \hat{p}_u$ is the dispersion of the conjugate operator $\hat{p}_u$ under the loss-phase (boundary-emphasized) state. NUP implies the forbidden region $\Delta \hat{m}_u^\star \Delta \hat{p}_u < \tfrac{1}{2}$. Vision boundary-stress concentrates toward the upper-left (small $\Delta \hat{m}_u^\star$, large $\Delta \hat{p}_u$), whereas LLM under-conditioning / hallucination occupies a larger region away from the uncertainty boundary (large $\Delta \hat{m}_u^\star$, uncontrolled $\Delta \hat{p}_u$). Reliable behavior lies in an intermediate "Goldilocks" band where neither term is extreme. Note. Strictly speaking, $\Delta \hat{m}_u^\star$ and $\Delta \hat{p}_u$ denote the dispersions of the operators $\hat{m}_u$ and $\hat{p}_u$ under the loss-induced state (hence the "hat" notation at the operator level). In the figure, to emphasize the phase-plane geometry and the experimentally observable quantities, we plot numerical estimates of these dispersions. We therefore omit the hats in the figure labels for notational simplicity; the plotted quantities correspond one-to-one to $\Delta \hat{m}_u^\star$ and $\Delta \hat{p}_u$ as defined in the equations.
  • Figure 2: Evolution of the CC-Probe and accuracy during training. We plot evaluation accuracy (red, right axis) and the mean absolute input--gradient cosine $\bar{c}_{\text{img}}$ (left axis; Eq. \ref{['eq:c_img']}), computed on the held-out evaluation split every 10 epochs and reported separately for correctly classified versus misclassified samples (green vs. blue). Gradients are taken w.r.t. the standard cross-entropy loss using the ground-truth label. (a) CIFAR-10 (ResNet-18, DenseNet-121, ViT-Tiny, EfficientNet-B0). (b) Tiny-ImageNet-200 (ResNet-50, DenseNet-121, EfficientNet-B4, Swin-Tiny).
  • Figure 3: Effect of gradient-aligned perturbations on accuracy and $\bar{c}_\text{img}$. We evaluate ResNet-18, ViT, and Vim on CIFAR-10 and ImageNet-100. Left column: accuracy; right column: $\bar{c}_\text{img}$. We compare Clean, +FGSM and --FGSM across perturbation levels $\epsilon$ (see Supplementary Material for exact values).

Theorems & Definitions (8)

  • Theorem 3.1: Neural Uncertainty Relation
  • Lemma 3.1: Mixed-axis form of NUP
  • Theorem 3.2: Covariance reduction under the loss-induced state
  • Lemma 3.2: Directional-probe correlation and cosine
  • Proposition 1: Exp. 1 prediction: a "high-cosine tail" marks hard/fragile vision samples
  • Proposition 2: Exp. 2 prediction: $\pm$FGSM changes CC-Probe in the expected direction
  • Proposition 3: Exp. 3--4 prediction: training that suppresses strong $x\!\cdot\!p$ coupling becomes more robust
  • Proposition 4: Exp. 5--6 prediction: in LLM prefill, low CC-Probe means under-conditioning and higher hallucination risk