Table of Contents
Fetching ...

Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective

Zeyuan Yin, Eric Xing, Zhiqiang Shen

TL;DR

This work addresses the scalability bottleneck of dataset condensation by introducing Squeeze, Recover and Relabel (SRe2L), a decoupled, unilevel training pipeline that avoids the computational burden of bilevel optimization. It first squeezes original data into a model, recovers condensed images by aligning BN statistics and applying crop-based regularization, and finally relabels with soft labels derived via crop-aware distillation. On Tiny-ImageNet and full ImageNet-1K, SRe2L achieves state-of-the-art results at IPC=50 (e.g., $60.8\%$ on ImageNet-1K) and delivers substantial speedups and memory savings over prior methods, while enabling condensation across resolutions and architectures including ViTs with BN adaptations. The approach also demonstrates robust cross-architecture generalization, improved visual quality of synthetic data, and applicability to continual learning, underscoring its practical impact for large-scale, resource-constrained data synthesis.

Abstract

We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for efficient dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution synthesis, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also surpasses MTT in terms of speed by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://github.com/VILA-Lab/SRe2L.

Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective

TL;DR

This work addresses the scalability bottleneck of dataset condensation by introducing Squeeze, Recover and Relabel (SRe2L), a decoupled, unilevel training pipeline that avoids the computational burden of bilevel optimization. It first squeezes original data into a model, recovers condensed images by aligning BN statistics and applying crop-based regularization, and finally relabels with soft labels derived via crop-aware distillation. On Tiny-ImageNet and full ImageNet-1K, SRe2L achieves state-of-the-art results at IPC=50 (e.g., on ImageNet-1K) and delivers substantial speedups and memory savings over prior methods, while enabling condensation across resolutions and architectures including ViTs with BN adaptations. The approach also demonstrates robust cross-architecture generalization, improved visual quality of synthetic data, and applicability to continual learning, underscoring its practical impact for large-scale, resource-constrained data synthesis.

Abstract

We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SReL) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for efficient dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution synthesis, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also surpasses MTT in terms of speed by approximately 52 (ConvNet-4) and 16 (ResNet-18) faster with less memory consumption of 11.6 and 6.4 during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://github.com/VILA-Lab/SRe2L.
Paper Structure (22 sections, 20 equations, 10 figures, 11 tables)

This paper contains 22 sections, 20 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Left is data synthesis time vs. accuracy on ImageNet-1K with 10 IPC (Images Per Class). Models include ConvNet-4, ResNet-{18, 50, 101}. $^\dag$ indicates ViT with 10M parameters cui2022dc. Right is the comparison of widely-used bilevel optimization and our proposed decoupled training scheme.
  • Figure 2: Overview of our framework. It consists of three stages: in the first stage, a model is trained from scratch to accommodate most of the crucial information from the original dataset. In the second stage, a recovery process is performed to synthesize the target data from the Gaussian noise. In the third stage, we relabel the synthetic data in a crop-level scheme to reflect the true label of the data.
  • Figure 3: Visualization of distilled examples on ImageNet-1K under various regularization terms and crop augmentation settings. Selected classes are {Volcano, Hammerhead Shark, Bee, Valley}.
  • Figure 4: Top-1 val accuracy of models trained on various labels and temperature settings under IPC 50. T and S represent the reference model for relabeling and the target model to be trained, separately. R18, R50, and R101 are the abbreviation of ResNet-18, ResNet-50, and ResNet-101.
  • Figure 5: Visualization of MTT cazenavette2022distillation and our SRe$^2$L. The upper two rows are synthetic Tiny-ImageNet and the lower two rows are synthetic ImageNet-1K (the first row is MTT and second is ours).
  • ...and 5 more figures