Distilling Datasets Into Less Than One Image

Asaf Shul; Eliahu Horwitz; Yedid Hoshen

Distilling Datasets Into Less Than One Image

Asaf Shul, Eliahu Horwitz, Yedid Hoshen

TL;DR

This paper introduces Poster Dataset Distillation (PoDD), a new setting that distills an entire dataset into less than one image-per-class by representing the dataset as a single poster. The approach uses overlapping patches to form a differentiable, patch-based training set, with learnable patch labels (PoDDL) and a CLIP-guided class ordering (PoCO) to maximize cross-class pixel sharing. Empirically, PoDD achieves state-of-the-art performance with as little as $0.3$ IPC and sets new $1$ IPC SoTA on CIFAR-10, CIFAR-100, and CUB200, while remaining competitive on Tiny-ImageNet. The work opens new research directions in data-efficient distillation, including alternative orderings, labeling schemes, and extensions beyond the sub-$1$ IPC regime, with potential environmental and resource benefits for large-scale learning.

Abstract

Dataset distillation aims to compress a dataset into a much smaller one so that a model trained on the distilled dataset achieves high accuracy. Current methods frame this as maximizing the distilled classification accuracy for a budget of K distilled images-per-class, where K is a positive integer. In this paper, we push the boundaries of dataset distillation, compressing the dataset into less than an image-per-class. It is important to realize that the meaningful quantity is not the number of distilled images-per-class but the number of distilled pixels-per-dataset. We therefore, propose Poster Dataset Distillation (PoDD), a new approach that distills the entire original dataset into a single poster. The poster approach motivates new technical solutions for creating training images and learnable labels. Our method can achieve comparable or better performance with less than an image-per-class compared to existing methods that use one image-per-class. Specifically, our method establishes a new state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 using as little as 0.3 images-per-class.

Distilling Datasets Into Less Than One Image

TL;DR

IPC and sets new

IPC SoTA on CIFAR-10, CIFAR-100, and CUB200, while remaining competitive on Tiny-ImageNet. The work opens new research directions in data-efficient distillation, including alternative orderings, labeling schemes, and extensions beyond the sub-

IPC regime, with potential environmental and resource benefits for large-scale learning.

Abstract

Paper Structure (14 sections, 3 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 14 sections, 3 equations, 10 figures, 2 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
PoDD: Poster Dataset Distillation
A Shared Poster Representation
PoCO: Poster Class Ordering
PoDDL: Poster Dataset Distillation Labeling
Experiments
Experimental Setting
Results
Ablations
Discussion and Future Work
Conclusion
Broader Impact

Figures (10)

Figure 1: Poster Dataset Distillation (PoDD): We propose PoDD, a new dataset distillation setting for a tiny, under $1$ image-per-class (IPC) budget. In this example, the standard method attains an accuracy of $35.5\%$ on CIFAR-100 with approximately $100k$ pixels, PoDD achieves an accuracy of $35.7\%$ with less than half the pixels (roughly $40k$)
Figure 2: Dataset Compression Scale: We show increasingly more compressed methods from left to right. The original dataset contains all of the training data and does not perform any compression. Coreset methods select a subset of the original dataset, without modifying the images. Dataset distillation methods compress an entire dataset by synthesizing $K \in \mathbb{N}^{+}$ images-per-class (IPC). Our method, Poster Dataset Distillation (PoDD) distills an entire dataset into a single poster that achieves the same performance as $1$ IPC while using as little as $0.3$ IPC
Figure 3: PoDD Overview: We propose PoDD, a new dataset distillation setting for under 1 images-per-class. We start by initializing a random poster (a), during distillation, we optimize overlapping patches and soft labels (b-c). The final distilled poster has fewer pixels than the combined pixels of the individual images (d). During inference, we extract overlapping patches and soft labels from the distilled poster and use them to train a downstream model (e-f). PoDD achieves comparable or better accuracy to current methods while using as little as a third of the pixels
Figure 4: PoCO: Pseudocode for PoCO class ordering
Figure 5: PoDDL Extraction: Each poster patch has a corresponding patch in the label array (a-b). We compute the poster patch label by extracting a patch along the channels of the label array (c). To obtain the final soft label for a given poster patch, we pool and normalize the extracted label window, resulting in a soft label vector (d). PoDDL supports both fixed and learned labels
...and 5 more figures

Distilling Datasets Into Less Than One Image

TL;DR

Abstract

Distilling Datasets Into Less Than One Image

Authors

TL;DR

Abstract

Table of Contents

Figures (10)