Table of Contents
Fetching ...

Distilled Datamodel with Reverse Gradient Matching

Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang

TL;DR

This work tackles the challenge of attributing a pre-trained model’s behavior to its training data without costly retraining. It introduces Distilled Datamodel (DDM), which offline-distills data influence into a compact synset via reverse gradient matching and then enables online evaluation to rapidly compute an attribution matrix for various behaviors. The framework combines clustering-based data condensation with hierarchical synsets to accelerate leave-one-out analyses while preserving interpretability and privacy of the original data. Empirical results on MNIST, CIFAR-10/100, and TinyImageNet show that DDM achieves accurate data-attribution and faster evaluation than exact unlearning, with robust performance across architectures and initialization schemes. The approach offers practical benefits for data quality assessment, model diagnostics, and transferability studies, with potential extensions to broader model behaviors and tasks.

Abstract

The proliferation of large-scale AI models trained on extensive datasets has revolutionized machine learning. With these models taking on increasingly central roles in various applications, the need to understand their behavior and enhance interpretability has become paramount. To investigate the impact of changes in training data on a pre-trained model, a common approach is leave-one-out retraining. This entails systematically altering the training dataset by removing specific samples to observe resulting changes within the model. However, retraining the model for each altered dataset presents a significant computational challenge, given the need to perform this operation for every dataset variation. In this paper, we introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. During the offline training phase, we approximate the influence of training data on the target model through a distilled synset, formulated as a reversed gradient matching problem. For online evaluation, we expedite the leave-one-out process using the synset, which is then utilized to compute the attribution matrix based on the evaluation objective. Experimental evaluations, including training data attribution and assessments of data quality, demonstrate that our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.

Distilled Datamodel with Reverse Gradient Matching

TL;DR

This work tackles the challenge of attributing a pre-trained model’s behavior to its training data without costly retraining. It introduces Distilled Datamodel (DDM), which offline-distills data influence into a compact synset via reverse gradient matching and then enables online evaluation to rapidly compute an attribution matrix for various behaviors. The framework combines clustering-based data condensation with hierarchical synsets to accelerate leave-one-out analyses while preserving interpretability and privacy of the original data. Empirical results on MNIST, CIFAR-10/100, and TinyImageNet show that DDM achieves accurate data-attribution and faster evaluation than exact unlearning, with robust performance across architectures and initialization schemes. The approach offers practical benefits for data quality assessment, model diagnostics, and transferability studies, with potential extensions to broader model behaviors and tasks.

Abstract

The proliferation of large-scale AI models trained on extensive datasets has revolutionized machine learning. With these models taking on increasingly central roles in various applications, the need to understand their behavior and enhance interpretability has become paramount. To investigate the impact of changes in training data on a pre-trained model, a common approach is leave-one-out retraining. This entails systematically altering the training dataset by removing specific samples to observe resulting changes within the model. However, retraining the model for each altered dataset presents a significant computational challenge, given the need to perform this operation for every dataset variation. In this paper, we introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. During the offline training phase, we approximate the influence of training data on the target model through a distilled synset, formulated as a reversed gradient matching problem. For online evaluation, we expedite the leave-one-out process using the synset, which is then utilized to compute the attribution matrix based on the evaluation objective. Experimental evaluations, including training data attribution and assessments of data quality, demonstrate that our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
Paper Structure (29 sections, 15 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 15 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: The framework of the proposed distilled datamodel. During the offline training, the synset is distilled during the normal training of target network. As for online evaluation we perturb the learned synset and fast learn the perturbed model set, which is computed to form the final attribution matrix.
  • Figure 2: The proposed reverse gradient matching process. The synset is optimized by the reverse gradients.
  • Figure 3: Comparison of the training data attribution weights calculated form different network architectures. In the figure, we show the class-wise weights.
  • Figure 4: Visualization of condensed 10 image/class with ConvNet for MNIST (a) and CIFAR-100 (b). We compare the visualization results between gradient matching and reverse gradient matching. Each column represents a condensation of a cluster.
  • Figure 5: Visualization of condensed 10 image/class with ConvNet for TinyImageNet dataset. We compare the visualization results between gradient matching (GM) and reverse gradient matching (DDM). In each visualization, each column represents a condensation of a cluster.
  • ...and 3 more figures