Training Data Attribution via Approximate Unrolled Differentiation

Juhan Bae; Wu Lin; Jonathan Lorraine; Roger Grosse

Training Data Attribution via Approximate Unrolled Differentiation

Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse

TL;DR

This work tackles training data attribution (TDA) in modern neural networks, where implicit-differentiation methods assume converged, unique optima and unrolled methods are costly for large models or multi-stage training. Source introduces a segmented, stationary unrolling approach that approximates the total derivative of final parameters with respect to downweighting a training point, yielding an influence-function–like estimator without storing all training checkpoints. By partitioning the training into segments and modeling segmentwise Hessians and gradients as stationary, Source derives a closed-form expression that combines segmental influence via matrix functions and a damped inverse Hessian mechanism; EK-FAC parameterization enables scalable Hessian handling. Empirically, Source outperforms existing TDA techniques on diverse tasks, particularly when models are non-converged or trained in multiple stages, and provides a practical middle ground between IF and full unrolling with favorable computational trade-offs. This offers a robust, scalable tool for data provenance, debugging, and dataset curation in complex training pipelines and large-scale models.

Abstract

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges. In this work, we connect the implicit-differentiation-based and unrolling-based approaches and combine their benefits by introducing Source, an approximate unrolling-based TDA method that is computed using an influence-function-like formula. While being computationally efficient compared to unrolling-based approaches, Source is suitable in cases where implicit-differentiation-based approaches struggle, such as in non-converged models and multi-stage training pipelines. Empirically, Source outperforms existing TDA techniques in counterfactual prediction, especially in settings where implicit-differentiation-based approaches fall short.

Training Data Attribution via Approximate Unrolled Differentiation

TL;DR

Abstract

Paper Structure (56 sections, 40 equations, 15 figures, 3 tables)

This paper contains 56 sections, 40 equations, 15 figures, 3 tables.

Introduction
Background
Training Data Attribution
Influence Functions
Evaluation of TDA Techniques
Linear Datamodeling Score (LDS).
Subset Removal Counterfactual Evaluation.
Downstream Task Evaluation.
Methods
Motivation: Unrolling for Training Data Attribution
Segmenting the Training Trajectory
Approximation of $\mathbb{E} [\mathbf{S}_\ell]$.
Approximation of $\mathbb{E}[\mathbf{r}_\ell]$.
Full Procedure
Practical Algorithm for SOURCE
...and 41 more sections

Figures (15)

Figure 1: A simplified illustration of unrolled differentiation in SGD with a batch size of $1$ and a data point of interest $\boldsymbol{z}_m$ appearing once in training at iteration $k$. The highlighted nodes in the box represent the computation graph with the update rule from \ref{['eq:param_sgd']}, where $B = 1$ and $\boldsymbol{z}_k = \boldsymbol{z}_m$. Unrolling backpropagates through the optimization steps from $\boldsymbol{\theta}_T$ to compute the total derivative with respect to ${\color[rgb]{0.4352,0,0} \epsilon}$, requiring all parameter vectors from $k$ to $T$ to be saved in memory.
Figure 2: Illustrative comparision of influence functions and unrolling-based TDA. Each contour represents the cost function at different values of $\epsilon$, which controls the degree of downweighting a data point $\boldsymbol{z}_m$.
Figure 3: A demonstration of the match in qualitative behavior between $F_{\mathbf{r}}$ and $F_{\rm inv}$, where we set $\bar{\eta}_\ell = 0.1$ and $K_\ell = 100$.
Figure 4: A simplified illustration of Source with $3$ segments ($L = 3$), as defined in \ref{['eq:unif_segment']}. Source divides the training trajectory into one or more segments and approximates the gradient $\bar{\mathbf{g}}_\ell$ and Hessian $\bar{\mathbf{H}}_\ell$ distributions as stationary with a fixed learning rate $\bar{\eta}_\ell$ within each segment $\ell$. Compared to unrolling in \ref{['fig:computation_graph']}, Source does not require storing the entire optimization variables throughout training. Instead, it only requires a handful of checkpoints throughout training to approximate the means of the Hessians and gradients.
Figure 5: Linear datamodeling scores (LDS) across a range of data sampling ratios $\alpha$ for Source ($L = \{1, 3\}$) and baseline TDA techniques. The LDS is measured for a single model setup, and error bars represent $95\%$ bootstrap confidence intervals.
...and 10 more figures

Training Data Attribution via Approximate Unrolled Differentiation

TL;DR

Abstract

Training Data Attribution via Approximate Unrolled Differentiation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)