Table of Contents
Fetching ...

A Unified Theory of Random Projection for Influence Functions

Pingbang Hu, Yuzheng Hu, Jiaqi W. Ma, Han Zhao

TL;DR

This work provides a principled theory for when projection can faithfully preserve influence scores of the form $\tau_{\lambda}(g,g')=g^{\top}(F+\lambda I)^{-1}g'$ in large-scale models. It delineates a sharp boundary in the unregularized regime (necessitating sketch injectivity on $\mathrm{range}(F)$ and thus $m\ge r$) and shows that ridge regularization replaces this with an effective-dimension bound $d_{\lambda}(F)=\mathrm{tr}(F(F+\lambda I)^{-1})$, yielding $m=\Omega((d_{\lambda}(F)+\log(1/\delta))/\varepsilon^2)$. The authors extend the theory to Kronecker-factored curvatures, showing that factorized sketches require $m_A$ and $m_E$ governed by $d_{\lambda_E}(A)$ and $d_{\lambda_A}(E)$, with a multiplicative trade-off in sketch size. They also quantify leakage when test gradients lie outside $\mathrm{range}(F)$ and provide finite-sample, high-probability guarantees for both single and multiple test gradients, including factorized leakage. Complemented by experiments on MNIST and CIFAR, the results offer actionable guidance for choosing sketch size and regularization to balance faithfulness and utility in scalable data attribution.

Abstract

Influence functions and related data attribution scores take the form of $g^{\top}F^{-1}g^{\prime}$, where $F\succeq 0$ is a curvature operator. In modern overparameterized models, forming or inverting $F\in\mathbb{R}^{d\times d}$ is prohibitive, motivating scalable influence computation via random projection with a sketch $P \in \mathbb{R}^{m\times d}$. This practice is commonly justified via the Johnson--Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, JL does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions. When $g,g^{\prime}\in\text{range}(F)$, we show that: 1) Unregularized projection: exact preservation holds iff $P$ is injective on $\text{range}(F)$, which necessitates $m\geq \text{rank}(F)$; 2) Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of $F$ at the regularization scale; 3) Factorized influence: for Kronecker-factored curvatures $F=A\otimes E$, the guarantees continue to hold for decoupled sketches $P=P_A\otimes P_E$, even though such sketches exhibit row correlations that violate i.i.d. assumptions. Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a \emph{leakage} term that arises when test gradients have components in $\ker(F)$. This yields guarantees for influence queries on general test points. Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled guidance for choosing the sketch size in practice.

A Unified Theory of Random Projection for Influence Functions

TL;DR

This work provides a principled theory for when projection can faithfully preserve influence scores of the form in large-scale models. It delineates a sharp boundary in the unregularized regime (necessitating sketch injectivity on and thus ) and shows that ridge regularization replaces this with an effective-dimension bound , yielding . The authors extend the theory to Kronecker-factored curvatures, showing that factorized sketches require and governed by and , with a multiplicative trade-off in sketch size. They also quantify leakage when test gradients lie outside and provide finite-sample, high-probability guarantees for both single and multiple test gradients, including factorized leakage. Complemented by experiments on MNIST and CIFAR, the results offer actionable guidance for choosing sketch size and regularization to balance faithfulness and utility in scalable data attribution.

Abstract

Influence functions and related data attribution scores take the form of , where is a curvature operator. In modern overparameterized models, forming or inverting is prohibitive, motivating scalable influence computation via random projection with a sketch . This practice is commonly justified via the Johnson--Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, JL does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions. When , we show that: 1) Unregularized projection: exact preservation holds iff is injective on , which necessitates ; 2) Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of at the regularization scale; 3) Factorized influence: for Kronecker-factored curvatures , the guarantees continue to hold for decoupled sketches , even though such sketches exhibit row correlations that violate i.i.d. assumptions. Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a \emph{leakage} term that arises when test gradients have components in . This yields guarantees for influence queries on general test points. Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled guidance for choosing the sketch size in practice.
Paper Structure (49 sections, 24 theorems, 233 equations, 4 figures)

This paper contains 49 sections, 24 theorems, 233 equations, 4 figures.

Key Result

Theorem 1

The equality $\tau_0(g, g^{\prime}) = \widetilde{\tau}_0(g, g^{\prime})$ holds for any $g, g^{\prime} \in\mathop{\mathrm{range}}\nolimits(F)$iff$P$ is injective on $\mathop{\mathrm{range}}\nolimits(F)$, i.e. $\mathop{\mathrm{rank}}\nolimits(PU) = \mathop{\mathrm{rank}}\nolimits(F) = r$ where $F=U\La

Figures (4)

  • Figure 1: Ordered spectrum $\lambda_i$ of the empirical Fisher $F$.
  • Figure 2: Approximation error versus normalized sketch size.
  • Figure 3: Approximation error and LDS versus $\lambda$.
  • Figure 4: Left: selecting $\lambda^{\ast}$ on a validation set using large $m$. Right: held-out test LDS versus $m / d_{\lambda^{\ast}}(F)$.

Theorems & Definitions (27)

  • Theorem 1: Barrier of unregularized projection
  • Theorem 2: Upper bound of regularized projection
  • Remark 3
  • Theorem 4: Lower bound for regularized projection
  • Theorem 5: Barrier of unregularized projection for factorized influence
  • Theorem 6: Upper bound of regularized projection for factorized influence
  • Remark 7
  • Theorem 8
  • Theorem 9
  • Remark 10
  • ...and 17 more