Table of Contents
Fetching ...

Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks

Shihao Li, Jiachen Li, Dongmei Chen

TL;DR

This work addresses the fragility of data attribution under distributional shifts by proposing certified robust attribution via Wasserstein DRO. It reveals a spectral amplification barrier in deep networks under Euclidean geometry and introduces the Natural Wasserstein metric, grounded in the model's feature covariance, to obtain non-vacuous certificates. The key theoretical contribution is the Self-Influence measure, shown to equal the attribution Lipschitz constant under the Natural metric, linking leverage-based anomaly detection to robustness. Empirically, Natural W-TRAK yields non-vacuous certificates (68.7% of ranking pairs on CIFAR-10 with ResNet-18) and enables effective identification of mislabeled data (0.970 AUROC for label-noise detection), providing both practical data valuation and a principled robustness guarantee for attribution in deep networks.

Abstract

Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. We present a unified framework for certified robust attribution that extends from convex models to deep networks. For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees. For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification -- a mechanism where the inherent ill-conditioning of deep representations inflates Lipschitz bounds by over $10{,}000\times$. This explains why standard TRAK scores, while accurate point estimates, are geometrically fragile: naive Euclidean robustness analysis yields 0\% certification. Our key contribution is the Natural Wasserstein metric, which measures perturbations in the geometry induced by the model's own feature covariance. This eliminates spectral amplification, reducing worst-case sensitivity by $76\times$ and stabilizing attribution estimates. On CIFAR-10 with ResNet-18, Natural W-TRAK certifies 68.7\% of ranking pairs compared to 0\% for Euclidean baselines -- to our knowledge, the first non-vacuous certified bounds for neural network attribution. Furthermore, we prove that the Self-Influence term arising from our analysis equals the Lipschitz constant governing attribution stability, providing theoretical grounding for leverage-based anomaly detection. Empirically, Self-Influence achieves 0.970 AUROC for label noise detection, identifying 94.1\% of corrupted labels by examining just the top 20\% of training data.

Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks

TL;DR

This work addresses the fragility of data attribution under distributional shifts by proposing certified robust attribution via Wasserstein DRO. It reveals a spectral amplification barrier in deep networks under Euclidean geometry and introduces the Natural Wasserstein metric, grounded in the model's feature covariance, to obtain non-vacuous certificates. The key theoretical contribution is the Self-Influence measure, shown to equal the attribution Lipschitz constant under the Natural metric, linking leverage-based anomaly detection to robustness. Empirically, Natural W-TRAK yields non-vacuous certificates (68.7% of ranking pairs on CIFAR-10 with ResNet-18) and enables effective identification of mislabeled data (0.970 AUROC for label-noise detection), providing both practical data valuation and a principled robustness guarantee for attribution in deep networks.

Abstract

Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. We present a unified framework for certified robust attribution that extends from convex models to deep networks. For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees. For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification -- a mechanism where the inherent ill-conditioning of deep representations inflates Lipschitz bounds by over . This explains why standard TRAK scores, while accurate point estimates, are geometrically fragile: naive Euclidean robustness analysis yields 0\% certification. Our key contribution is the Natural Wasserstein metric, which measures perturbations in the geometry induced by the model's own feature covariance. This eliminates spectral amplification, reducing worst-case sensitivity by and stabilizing attribution estimates. On CIFAR-10 with ResNet-18, Natural W-TRAK certifies 68.7\% of ranking pairs compared to 0\% for Euclidean baselines -- to our knowledge, the first non-vacuous certified bounds for neural network attribution. Furthermore, we prove that the Self-Influence term arising from our analysis equals the Lipschitz constant governing attribution stability, providing theoretical grounding for leverage-based anomaly detection. Empirically, Self-Influence achieves 0.970 AUROC for label noise detection, identifying 94.1\% of corrupted labels by examining just the top 20\% of training data.

Paper Structure

This paper contains 57 sections, 7 theorems, 33 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Proposition 3.4

Let $Q_t = (1-t)P_n + t\delta_z$ for $t \in [0, 1]$. Under the regularity conditions:

Figures (7)

  • Figure 1: The Robustness Gap: Convex vs. Non-Convex.Left: In convex optimization, the loss landscape has a unique minimum. The linear approximation (tangent plane) accurately predicts parameter changes under data perturbations, enabling tight certified regions. Right: In non-convex landscapes (neural networks), the loss surface contains multiple local minima. A small distributional perturbation can cause the optimization trajectory to "basin hop" to a different minimum---a global change that local linear approximations cannot predict basu2020influencedinh2017sharp. This renders classical W-RIF intervals vacuous, motivating the need for TRAK's linearization approach.
  • Figure 2: Spectral amplification in ResNet-18 features. The feature covariance matrix $Q$ has condition number $\kappa(Q) = 2.71 \times 10^5$, with eigenvalues spanning 5 orders of magnitude. The Euclidean Lipschitz constant is implicitly amplified by $1/\lambda_{\min}$, reaching 10,000$\times$ amplification. The Natural metric (red) maintains constant amplification factor of 1.0 across the entire spectrum.
  • Figure 3: The OOD Barrier. Self-Influence of test points vs. their distance to the training manifold. Points far from the training distribution (right side) have inflated Self-Influence, indicating larger Lipschitz constants and weaker certification. This is a feature, not a bug: attribution for OOD points is inherently less stable, and our framework correctly identifies this.
  • Figure 4: Certification Frontier: Natural vs. Euclidean Geometry. Percentage of ranking pairs certified as stable under distributional perturbations. Euclidean W-TRAK (red) certifies 0% of pairs---intervals are so wide that all rankings overlap. Natural W-TRAK (blue) certifies 68.7% of pairs.
  • Figure 5: Self-Influence as Theoretically Grounded Anomaly Detector.Left: ROC curve for detecting mislabeled training points using Self-Influence, achieving 0.970 AUROC. Center: Precision-Recall curve (0.796 AP). Right: Distribution of Self-Influence for clean vs. corrupted samples, showing $5.12\times$ separation in means.
  • ...and 2 more figures

Theorems & Definitions (20)

  • Remark 3.1: Regularization Convention
  • Definition 3.2: Regularity Conditions
  • Definition 3.3: Wasserstein-Robust Influence
  • Proposition 3.4: Parameter Sensitivity
  • proof
  • Definition 3.5: Complete Sensitivity Kernel
  • Remark 3.6: Interpretation
  • Theorem 3.7: W-RIF Closed Form
  • proof
  • Lemma 3.8: LOO as Distributional Perturbation
  • ...and 10 more