Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Zeyuan Yin; Zhiqiang Shen

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Zeyuan Yin, Zhiqiang Shen

TL;DR

The proposed framework achieves the current published highest accuracy on both large-scale ImageNet-1K and 21K with 63.2% under IPC 50 and 36.1% under IPC 20, using a regular input resolution of 224$\times$224 with faster convergence speed and less synthetic time.

Abstract

Dataset distillation or condensation aims to generate a smaller but representative subset from a large dataset, which allows a model to be trained more efficiently, meanwhile evaluating on the original testing data distribution to achieve decent performance. Previous decoupled methods like SRe$^2$L simply use a unified gradient update scheme for synthesizing data from Gaussian noise, while, we notice that the initial several update iterations will determine the final outline of synthesis, thus an improper gradient update strategy may dramatically affect the final generation quality. To address this, we introduce a simple yet effective global-to-local gradient refinement approach enabled by curriculum data augmentation ($\texttt{CDA}$) during data synthesis. The proposed framework achieves the current published highest accuracy on both large-scale ImageNet-1K and 21K with 63.2% under IPC (Images Per Class) 50 and 36.1% under IPC 20, using a regular input resolution of 224$\times$224 with faster convergence speed and less synthetic time. The proposed model outperforms the current state-of-the-art methods like SRe$^2$L, TESLA, and MTT by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the first time, reduces the gap to its full-data training counterparts to less than absolute 15%. Moreover, this work represents the inaugural success in dataset distillation on the larger-scale ImageNet-21K dataset under the standard 224$\times$224 resolution. Our code and distilled ImageNet-21K dataset of 20 IPC, 2K recovery budget are available at https://github.com/VILA-Lab/SRe2L/tree/main/CDA.

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

TL;DR

The proposed framework achieves the current published highest accuracy on both large-scale ImageNet-1K and 21K with 63.2% under IPC 50 and 36.1% under IPC 20, using a regular input resolution of 224

224 with faster convergence speed and less synthetic time.

Abstract

L simply use a unified gradient update scheme for synthesizing data from Gaussian noise, while, we notice that the initial several update iterations will determine the final outline of synthesis, thus an improper gradient update strategy may dramatically affect the final generation quality. To address this, we introduce a simple yet effective global-to-local gradient refinement approach enabled by curriculum data augmentation (

) during data synthesis. The proposed framework achieves the current published highest accuracy on both large-scale ImageNet-1K and 21K with 63.2% under IPC (Images Per Class) 50 and 36.1% under IPC 20, using a regular input resolution of 224

224 with faster convergence speed and less synthetic time. The proposed model outperforms the current state-of-the-art methods like SRe

L, TESLA, and MTT by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the first time, reduces the gap to its full-data training counterparts to less than absolute 15%. Moreover, this work represents the inaugural success in dataset distillation on the larger-scale ImageNet-21K dataset under the standard 224

224 resolution. Our code and distilled ImageNet-21K dataset of 20 IPC, 2K recovery budget are available at https://github.com/VILA-Lab/SRe2L/tree/main/CDA.

Paper Structure (24 sections, 8 equations, 12 figures, 21 tables, 1 algorithm)

This paper contains 24 sections, 8 equations, 12 figures, 21 tables, 1 algorithm.

Introduction
Related Work
Approach
Preliminary: Dataset Distillation
Dataset Distillation on Large-scale Datasets
Global-to-local Gradient Update via Curriculum
Experiments
Datasets and Implementation Details
CIFAR-100
Tiny-ImageNet
ImageNet-1K
ImageNet-21K
Ablations
Analysis
Application: Continual Learning
...and 9 more sections

Figures (12)

Figure 1: ImageNet-1K comparison with SRe$^2$L.
Figure 2: Motivation of our work. The left column is the synthesized images after a few gradient update iterations from Gaussian noise. Middle and right columns are intermediate and final synthesized images.
Figure 3: Illustration of crop distribution from different lower and upper bounds in RandomResizedCrop. The first row is the central points of bounding boxes from different sampling scale hyperparameters. The second and last rows correspond to 30 and 10 boxes of the crop distributions. In each row, from left to right, the difficulty of crop distribution is decreasing.
Figure 4: Illustration of global-to-local data synthesis. This figure shows our specific curriculum procedure in data synthesis to provide a comprehensive overview of our dataset distillation framework. It starts with a large area (single bounding-box in each step) to optimize the image, building a better initialization, and then gradually narrows down the image area of learning process so that it can focus on more detailed areas.
Figure 5: Crop ratio schedulers of prior CTL solution (left) and our Global-to-local (right) enabled by curriculum. The colored regions depict the random sampling intervals for the crop ratio value in each iteration under different schedulers.
...and 7 more figures

Theorems & Definitions (1)

Definition 1: Curriculum Data Synthesis

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

TL;DR

Abstract

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (1)