Table of Contents
Fetching ...

Ablation Based Counterfactuals

Zheng Dai, David K Gifford

TL;DR

This paper tackles the challenge of attributing diffusion model outputs to specific training data without incurring the cost of retraining. It introduces Ablation Based Counterfactuals (ABC), implemented via diffusion ensembles trained on data-subset splits, enabling exact leave-one-out counterfactual landscapes by ablating components tied to particular data sources. A key innovation is differential ablation, which efficiently approximates ABC effects through a Jacobian-based Taylor expansion, reducing computational burden. Empirical results show ABCs yield counterfactuals comparable to retraining-based methods on small datasets, reveal that attribution weakens as training data grow, and uncover unattributable or nearly unattributable samples, with important implications for scientific interpretation and policy in data-rich regimes. Overall, ABC provides a scalable framework to analyze data influence in diffusion-based generation and exposes fundamental limits of data attribution at scale, motivating new paradigms for data-centric governance.

Abstract

Diffusion models are a class of generative models that generate high-quality samples, but at present it is difficult to characterize how they depend upon their training data. This difficulty raises scientific and regulatory questions, and is a consequence of the complexity of diffusion models and their sampling process. To analyze this dependence, we introduce Ablation Based Counterfactuals (ABC), a method of performing counterfactual analysis that relies on model ablation rather than model retraining. In our approach, we train independent components of a model on different but overlapping splits of a training set. These components are then combined into a single model, from which the causal influence of any training sample can be removed by ablating a combination of model components. We demonstrate how we can construct a model like this using an ensemble of diffusion models. We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes, and show that single source attributability diminishes with increasing training data size. Finally, we demonstrate the existence of unattributable samples.

Ablation Based Counterfactuals

TL;DR

This paper tackles the challenge of attributing diffusion model outputs to specific training data without incurring the cost of retraining. It introduces Ablation Based Counterfactuals (ABC), implemented via diffusion ensembles trained on data-subset splits, enabling exact leave-one-out counterfactual landscapes by ablating components tied to particular data sources. A key innovation is differential ablation, which efficiently approximates ABC effects through a Jacobian-based Taylor expansion, reducing computational burden. Empirical results show ABCs yield counterfactuals comparable to retraining-based methods on small datasets, reveal that attribution weakens as training data grow, and uncover unattributable or nearly unattributable samples, with important implications for scientific interpretation and policy in data-rich regimes. Overall, ABC provides a scalable framework to analyze data influence in diffusion-based generation and exposes fundamental limits of data attribution at scale, motivating new paradigms for data-centric governance.

Abstract

Diffusion models are a class of generative models that generate high-quality samples, but at present it is difficult to characterize how they depend upon their training data. This difficulty raises scientific and regulatory questions, and is a consequence of the complexity of diffusion models and their sampling process. To analyze this dependence, we introduce Ablation Based Counterfactuals (ABC), a method of performing counterfactual analysis that relies on model ablation rather than model retraining. In our approach, we train independent components of a model on different but overlapping splits of a training set. These components are then combined into a single model, from which the causal influence of any training sample can be removed by ablating a combination of model components. We demonstrate how we can construct a model like this using an ensemble of diffusion models. We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes, and show that single source attributability diminishes with increasing training data size. Finally, we demonstrate the existence of unattributable samples.
Paper Structure (39 sections, 1 theorem, 1 equation, 23 figures, 4 tables)

This paper contains 39 sections, 1 theorem, 1 equation, 23 figures, 4 tables.

Key Result

theorem thmcountertheorem

Given a data source $s$, define $S(s)$ as the set of models in the ensemble whose training split contains data produced by $s$. Then the set difference $S(s)\setminus S(s')$ is not the empty set for any distinct data sources $s$ and $s'$.

Figures (23)

  • Figure 1: Causal flow from data sources to generated sample
  • Figure 2: Ensembles of diffusion models generate good quality samples
  • Figure 3: Comparing ABCs (ablation) to RBCs (retraining)
  • Figure 4: Differential ablation attributes visually similar images when training set is small
  • Figure 5: Visually similar training images are not necessarily influential when training set is large
  • ...and 18 more figures

Theorems & Definitions (2)

  • theorem thmcountertheorem
  • definition thmcounterdefinition