A Unified Theory of Random Projection for Influence Functions

Pingbang Hu; Yuzheng Hu; Jiaqi W. Ma; Han Zhao

A Unified Theory of Random Projection for Influence Functions

Pingbang Hu, Yuzheng Hu, Jiaqi W. Ma, Han Zhao

TL;DR

This work provides a principled theory for when projection can faithfully preserve influence scores of the form $\tau_{\lambda}(g,g')=g^{\top}(F+\lambda I)^{-1}g'$ in large-scale models. It delineates a sharp boundary in the unregularized regime (necessitating sketch injectivity on $\mathrm{range}(F)$ and thus $m\ge r$) and shows that ridge regularization replaces this with an effective-dimension bound $d_{\lambda}(F)=\mathrm{tr}(F(F+\lambda I)^{-1})$, yielding $m=\Omega((d_{\lambda}(F)+\log(1/\delta))/\varepsilon^2)$. The authors extend the theory to Kronecker-factored curvatures, showing that factorized sketches require $m_A$ and $m_E$ governed by $d_{\lambda_E}(A)$ and $d_{\lambda_A}(E)$, with a multiplicative trade-off in sketch size. They also quantify leakage when test gradients lie outside $\mathrm{range}(F)$ and provide finite-sample, high-probability guarantees for both single and multiple test gradients, including factorized leakage. Complemented by experiments on MNIST and CIFAR, the results offer actionable guidance for choosing sketch size and regularization to balance faithfulness and utility in scalable data attribution.

Abstract

Influence functions and related data attribution scores take the form of $g^{\top}F^{-1}g^{\prime}$, where $F\succeq 0$ is a curvature operator. In modern overparameterized models, forming or inverting $F\in\mathbb{R}^{d\times d}$ is prohibitive, motivating scalable influence computation via random projection with a sketch $P \in \mathbb{R}^{m\times d}$. This practice is commonly justified via the Johnson--Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, JL does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions. When $g,g^{\prime}\in\text{range}(F)$, we show that: 1) Unregularized projection: exact preservation holds iff $P$ is injective on $\text{range}(F)$, which necessitates $m\geq \text{rank}(F)$; 2) Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of $F$ at the regularization scale; 3) Factorized influence: for Kronecker-factored curvatures $F=A\otimes E$, the guarantees continue to hold for decoupled sketches $P=P_A\otimes P_E$, even though such sketches exhibit row correlations that violate i.i.d. assumptions. Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a \emph{leakage} term that arises when test gradients have components in $\ker(F)$. This yields guarantees for influence queries on general test points. Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled guidance for choosing the sketch size in practice.

A Unified Theory of Random Projection for Influence Functions

TL;DR

This work provides a principled theory for when projection can faithfully preserve influence scores of the form

in large-scale models. It delineates a sharp boundary in the unregularized regime (necessitating sketch injectivity on

and thus

) and shows that ridge regularization replaces this with an effective-dimension bound

, yielding

. The authors extend the theory to Kronecker-factored curvatures, showing that factorized sketches require

and

governed by

and

, with a multiplicative trade-off in sketch size. They also quantify leakage when test gradients lie outside

and provide finite-sample, high-probability guarantees for both single and multiple test gradients, including factorized leakage. Complemented by experiments on MNIST and CIFAR, the results offer actionable guidance for choosing sketch size and regularization to balance faithfulness and utility in scalable data attribution.

Abstract

Influence functions and related data attribution scores take the form of

, where

is a curvature operator. In modern overparameterized models, forming or inverting

is prohibitive, motivating scalable influence computation via random projection with a sketch

. This practice is commonly justified via the Johnson--Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, JL does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions. When

, we show that: 1) Unregularized projection: exact preservation holds iff

is injective on

, which necessitates

; 2) Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of

at the regularization scale; 3) Factorized influence: for Kronecker-factored curvatures

, the guarantees continue to hold for decoupled sketches

, even though such sketches exhibit row correlations that violate i.i.d. assumptions. Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a \emph{leakage} term that arises when test gradients have components in

. This yields guarantees for influence queries on general test points. Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled guidance for choosing the sketch size in practice.

Paper Structure (49 sections, 24 theorems, 233 equations, 4 figures)

This paper contains 49 sections, 24 theorems, 233 equations, 4 figures.

Introduction
Setup and Notation.
Our Contributions.
Related Works
Projection-Based Influence Approximation
Unregularized Projection
Regularized Projection
Factorized Influence
Influence with Out-of-Range Test Gradients
Leakage of Projection
Leakage of Factorized Influence
Experiment and Discussion
Faithfulness--Utility Tradeoff.
Conclusion
Proofs for Section 2.1 (Unregularized Projection)
...and 34 more sections

Key Result

Theorem 1

The equality $\tau_0(g, g^{\prime}) = \widetilde{\tau}_0(g, g^{\prime})$ holds for any $g, g^{\prime} \in\mathop{\mathrm{range}}\nolimits(F)$iff$P$ is injective on $\mathop{\mathrm{range}}\nolimits(F)$, i.e. $\mathop{\mathrm{rank}}\nolimits(PU) = \mathop{\mathrm{rank}}\nolimits(F) = r$ where $F=U\La

Figures (4)

Figure 1: Ordered spectrum $\lambda_i$ of the empirical Fisher $F$.
Figure 2: Approximation error versus normalized sketch size.
Figure 3: Approximation error and LDS versus $\lambda$.
Figure 4: Left: selecting $\lambda^{\ast}$ on a validation set using large $m$. Right: held-out test LDS versus $m / d_{\lambda^{\ast}}(F)$.

Theorems & Definitions (27)

Theorem 1: Barrier of unregularized projection
Theorem 2: Upper bound of regularized projection
Remark 3
Theorem 4: Lower bound for regularized projection
Theorem 5: Barrier of unregularized projection for factorized influence
Theorem 6: Upper bound of regularized projection for factorized influence
Remark 7
Theorem 8
Theorem 9
Remark 10
...and 17 more

A Unified Theory of Random Projection for Influence Functions

TL;DR

Abstract

A Unified Theory of Random Projection for Influence Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (27)