Ablation Based Counterfactuals

Zheng Dai; David K Gifford

Ablation Based Counterfactuals

Zheng Dai, David K Gifford

TL;DR

This paper tackles the challenge of attributing diffusion model outputs to specific training data without incurring the cost of retraining. It introduces Ablation Based Counterfactuals (ABC), implemented via diffusion ensembles trained on data-subset splits, enabling exact leave-one-out counterfactual landscapes by ablating components tied to particular data sources. A key innovation is differential ablation, which efficiently approximates ABC effects through a Jacobian-based Taylor expansion, reducing computational burden. Empirical results show ABCs yield counterfactuals comparable to retraining-based methods on small datasets, reveal that attribution weakens as training data grow, and uncover unattributable or nearly unattributable samples, with important implications for scientific interpretation and policy in data-rich regimes. Overall, ABC provides a scalable framework to analyze data influence in diffusion-based generation and exposes fundamental limits of data attribution at scale, motivating new paradigms for data-centric governance.

Abstract

Diffusion models are a class of generative models that generate high-quality samples, but at present it is difficult to characterize how they depend upon their training data. This difficulty raises scientific and regulatory questions, and is a consequence of the complexity of diffusion models and their sampling process. To analyze this dependence, we introduce Ablation Based Counterfactuals (ABC), a method of performing counterfactual analysis that relies on model ablation rather than model retraining. In our approach, we train independent components of a model on different but overlapping splits of a training set. These components are then combined into a single model, from which the causal influence of any training sample can be removed by ablating a combination of model components. We demonstrate how we can construct a model like this using an ensemble of diffusion models. We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes, and show that single source attributability diminishes with increasing training data size. Finally, we demonstrate the existence of unattributable samples.

Ablation Based Counterfactuals

TL;DR

Abstract

Paper Structure (39 sections, 1 theorem, 1 equation, 23 figures, 4 tables)

This paper contains 39 sections, 1 theorem, 1 equation, 23 figures, 4 tables.

Introduction
Related work
Our Contributions
Methods
Preliminaries
Ablation based counterfactuals
Diffusion ensembles enable ablation
Differential ablation enable efficient approximation of ablation
Results
Diffusion ensembles are viable generative models
Ablation based counterfactuals are comparable with retraining based counterfactuals
Differential ablation based attribution finds visually similar images with small training sets
Visual attribution and counterfactual attribution diverges at large training set sizes
Generated samples can be unattributable
Attributability diminishes with increasing training set size
...and 24 more sections

Key Result

theorem thmcountertheorem

Given a data source $s$, define $S(s)$ as the set of models in the ensemble whose training split contains data produced by $s$. Then the set difference $S(s)\setminus S(s')$ is not the empty set for any distinct data sources $s$ and $s'$.

Figures (23)

Figure 1: Causal flow from data sources to generated sample
Figure 2: Ensembles of diffusion models generate good quality samples
Figure 3: Comparing ABCs (ablation) to RBCs (retraining)
Figure 4: Differential ablation attributes visually similar images when training set is small
Figure 5: Visually similar training images are not necessarily influential when training set is large
...and 18 more figures

Theorems & Definitions (2)

theorem thmcountertheorem
definition thmcounterdefinition

Ablation Based Counterfactuals

TL;DR

Abstract

Ablation Based Counterfactuals

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (2)