Table of Contents
Fetching ...

Dataset Condensation with Gradient Matching

Bo Zhao, Konda Reddy Mopuri, Hakan Bilen

TL;DR

The paper tackles the high data and compute costs of training deep models by introducing Dataset Condensation, which learns a small set of synthetic samples intended to train networks from scratch. It advances from a parameter-matching view to a curriculum gradient-matching framework that aligns training dynamics (gradients) between real and synthetic data, avoiding costly inner-loop unrolling. Empirical results across MNIST, SVHN, Fashion-MNIST, and CIFAR-10 show that a tiny number of synthetic examples can nearly match full-dataset performance and generalize across architectures, outperforming coreset methods and Dataset Distillation. The method demonstrates practical benefits for continual learning and neural architecture search, enabling efficient training with far lower memory and computation requirements.

Abstract

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch. We formulate this goal as a gradient matching problem between the gradients of deep neural network weights that are trained on the original and our synthetic data. We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods. Finally we explore the use of our method in continual learning and neural architecture search and report promising gains when limited memory and computations are available.

Dataset Condensation with Gradient Matching

TL;DR

The paper tackles the high data and compute costs of training deep models by introducing Dataset Condensation, which learns a small set of synthetic samples intended to train networks from scratch. It advances from a parameter-matching view to a curriculum gradient-matching framework that aligns training dynamics (gradients) between real and synthetic data, avoiding costly inner-loop unrolling. Empirical results across MNIST, SVHN, Fashion-MNIST, and CIFAR-10 show that a tiny number of synthetic examples can nearly match full-dataset performance and generalize across architectures, outperforming coreset methods and Dataset Distillation. The method demonstrates practical benefits for continual learning and neural architecture search, enabling efficient training with far lower memory and computation requirements.

Abstract

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch. We formulate this goal as a gradient matching problem between the gradients of deep neural network weights that are trained on the original and our synthetic data. We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods. Finally we explore the use of our method in continual learning and neural architecture search and report promising gains when limited memory and computations are available.

Paper Structure

This paper contains 47 sections, 10 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Dataset Condensation (left) aims to generate a small set of synthetic images that can match the performance of a network trained on a large image dataset. Our method (right) realizes this goal by learning a synthetic set such that a deep network trained on it and the large set produces similar gradients w.r.t. its weights. The synthetic data can later be used to train a network from scratch in a small fraction of the original computational load. CE denotes Cross-Entropy.
  • Figure 2: Visualization of condensed $1$ image/class with ConvNet for MNIST, FashionMNIST, SVHN and CIFAR10.
  • Figure 3: Cross-architecture performance in testing accuracy ($\%$) for condensed $1$ image/class in MNIST.
  • Figure 4: Comparison to DD wang2018dataset in terms of testing accuracy (%).
  • Figure 5: Neural Architecture Search. Methods are compared in performance, ranking correlation, time and memory cost.
  • ...and 10 more figures