Distilling Datasets Into Less Than One Image
Asaf Shul, Eliahu Horwitz, Yedid Hoshen
TL;DR
This paper introduces Poster Dataset Distillation (PoDD), a new setting that distills an entire dataset into less than one image-per-class by representing the dataset as a single poster. The approach uses overlapping patches to form a differentiable, patch-based training set, with learnable patch labels (PoDDL) and a CLIP-guided class ordering (PoCO) to maximize cross-class pixel sharing. Empirically, PoDD achieves state-of-the-art performance with as little as $0.3$ IPC and sets new $1$ IPC SoTA on CIFAR-10, CIFAR-100, and CUB200, while remaining competitive on Tiny-ImageNet. The work opens new research directions in data-efficient distillation, including alternative orderings, labeling schemes, and extensions beyond the sub-$1$ IPC regime, with potential environmental and resource benefits for large-scale learning.
Abstract
Dataset distillation aims to compress a dataset into a much smaller one so that a model trained on the distilled dataset achieves high accuracy. Current methods frame this as maximizing the distilled classification accuracy for a budget of K distilled images-per-class, where K is a positive integer. In this paper, we push the boundaries of dataset distillation, compressing the dataset into less than an image-per-class. It is important to realize that the meaningful quantity is not the number of distilled images-per-class but the number of distilled pixels-per-dataset. We therefore, propose Poster Dataset Distillation (PoDD), a new approach that distills the entire original dataset into a single poster. The poster approach motivates new technical solutions for creating training images and learnable labels. Our method can achieve comparable or better performance with less than an image-per-class compared to existing methods that use one image-per-class. Specifically, our method establishes a new state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 using as little as 0.3 images-per-class.
