Table of Contents
Fetching ...

EMP: Enhance Memory in Data Pruning

Jinying Xiao, Ping Li, Jie Nie, Bin Ji, Shasha Li, Xiaodong Liu, Jun Ma, Qingbo Wu, Jie Yu

TL;DR

The paper tackles the memory loss problem in data pruning for large models by identifying Low-Frequency Learning as pruning rates rise. It introduces a memory term to the pruning score and develops EMP to enhance data memorization in both supervised and self-supervised settings, including a theory grounded decomposition of cross entropy and mutual information. Empirical results across image classification, language understanding, and pre-training show EMP outperforms existing dynamic pruning methods at high pruning rates, with notable gains on CIFAR100-ResNet50 at 70% pruning. The approach has practical impact for reducing training costs while preserving or boosting performance, and future work will explore layer wise memory mechanisms and broader model architectures.

Abstract

Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning. Previous methods used sample loss as an evaluation criterion, aiming to select the most "difficult" samples for training. However, when the pruning rate increases, the number of times each sample is trained becomes more evenly distributed, which causes many critical or general samples to not be effectively fitted. We refer to this as Low-Frequency Learning (LFL). In other words, LFL prevents the model from remembering most samples. In our work, we decompose the scoring function of LFL, provide a theoretical explanation for the inefficiency of LFL, and propose adding a memory term to the scoring function to enhance the model's memory capability, along with an approximation of this memory term. Similarly, we explore memory in Self-Supervised Learning (SSL), marking the first discussion on SSL memory. Using contrastive learning, we derive the memory term both theoretically and experimentally. Finally, we propose Enhance Memory Pruning (EMP), which addresses the issue of insufficient memory under high pruning rates by enhancing the model's memory of data, thereby improving its performance. We evaluated the performance of EMP in tasks such as image classification, natural language understanding, and model pre-training. The results show that EMP can improve model performance under extreme pruning rates. For example, in the CIFAR100-ResNet50 pre-training task, with 70\% pruning, EMP outperforms current methods by 2.2\%.

EMP: Enhance Memory in Data Pruning

TL;DR

The paper tackles the memory loss problem in data pruning for large models by identifying Low-Frequency Learning as pruning rates rise. It introduces a memory term to the pruning score and develops EMP to enhance data memorization in both supervised and self-supervised settings, including a theory grounded decomposition of cross entropy and mutual information. Empirical results across image classification, language understanding, and pre-training show EMP outperforms existing dynamic pruning methods at high pruning rates, with notable gains on CIFAR100-ResNet50 at 70% pruning. The approach has practical impact for reducing training costs while preserving or boosting performance, and future work will explore layer wise memory mechanisms and broader model architectures.

Abstract

Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning. Previous methods used sample loss as an evaluation criterion, aiming to select the most "difficult" samples for training. However, when the pruning rate increases, the number of times each sample is trained becomes more evenly distributed, which causes many critical or general samples to not be effectively fitted. We refer to this as Low-Frequency Learning (LFL). In other words, LFL prevents the model from remembering most samples. In our work, we decompose the scoring function of LFL, provide a theoretical explanation for the inefficiency of LFL, and propose adding a memory term to the scoring function to enhance the model's memory capability, along with an approximation of this memory term. Similarly, we explore memory in Self-Supervised Learning (SSL), marking the first discussion on SSL memory. Using contrastive learning, we derive the memory term both theoretically and experimentally. Finally, we propose Enhance Memory Pruning (EMP), which addresses the issue of insufficient memory under high pruning rates by enhancing the model's memory of data, thereby improving its performance. We evaluated the performance of EMP in tasks such as image classification, natural language understanding, and model pre-training. The results show that EMP can improve model performance under extreme pruning rates. For example, in the CIFAR100-ResNet50 pre-training task, with 70\% pruning, EMP outperforms current methods by 2.2\%.
Paper Structure (21 sections, 3 theorems, 25 equations, 15 figures, 12 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 25 equations, 15 figures, 12 tables, 2 algorithms.

Key Result

Theorem 1

For a set of $m$ samples, an independently and identically distributed subset of retained data $(\hat{X}, \hat{Y}) = \{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}$, let $\hat{y}^{(i)}$ represent the model's prediction for the $i$-th sample, and let $c^{(i)} = \mathbf{1} \{ \hat{y}^{(i)} = y^{

Figures (15)

  • Figure 1: Throughout the entire training process (200 epochs), the number of times each sample is selected is collected under different data pruning algorithms at a pruning rate of 90%. Among them, InfoBatch, Greedy, and UCB are all pruning methods that score based on sample loss, which is known as Low-Frequency Learning (LFL).
  • Figure 2: Unlike other methods that use sample loss for scoring, we enhance model memory by adding a memory term $mem(x, \theta)$, where $\beta$ is an adjustable hyperparameter.
  • Figure 3: At a pruning rate of 90%, the training loss across different algorithms and datasets is compared. Among them, ELFL represents Extreme Low-Frequency Learning, and InfoBatch is a method of LFL (Low-Frequency Learning).
  • Figure 4: In CIFAR10-ResNet50, the loss statistics of a single sample when randomly removed were conducted over 50 experiments, with the red line representing the mean loss of the 50 samples.
  • Figure 5: Using the CIFAR10 dataset and the ResNet50 model, static pruning was performed by randomly sampling data. The figure reports the average loss of the pruned data at different pruning rates, with each data point run 5 times, and the shaded area represents the error range.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof