Color-Oriented Redundancy Reduction in Dataset Distillation
Bowen Yuan, Zijian Wang, Mahsa Baktashmotlagh, Yadan Luo, Zi Huang
TL;DR
This work tackles the inefficiency of large-scale image datasets by targeting color-space redundancy in Dataset Distillation (DD). It introduces AutoPalette, a two-component framework: a palette network that condenses each image from 256 colors to a reduced palette of size $K$, and a color-guided initialization that selects diverse samples using submodular information gain with a graph-cut objective, both optimized under the standard distillation loss with added palette losses. Key contributions include the palette-specific losses $\\mathcal{L}_{m}$, $\\mathcal{L}_{b}$, $\\mathcal{L}_{a}$, a storage-analysis that quantifies color-depth implications, and extensive experiments showing competitive or superior performance on CIFAR-10/100 and ImageNet subsets under tight storage budgets. The approach preserves essential discriminative features while enabling memory-efficient distillation, and it is compatible with multiple DD frameworks, enabling broader impact for efficient model training with reduced data footprints. All mathematical expressions are presented with proper $...$ delimiters to ensure clarity and reproducibility, such as $K$, $|\\mathcal{S}|$, and $\\mathcal{L}_{task}$.
Abstract
Dataset Distillation (DD) is designed to generate condensed representations of extensive image datasets, enhancing training efficiency. Despite recent advances, there remains considerable potential for improvement, particularly in addressing the notable redundancy within the color space of distilled images. In this paper, we propose AutoPalette, a framework that minimizes color redundancy at the individual image and overall dataset levels, respectively. At the image level, we employ a palette network, a specialized neural network, to dynamically allocate colors from a reduced color space to each pixel. The palette network identifies essential areas in synthetic images for model training and consequently assigns more unique colors to them. At the dataset level, we develop a color-guided initialization strategy to minimize redundancy among images. Representative images with the least replicated color patterns are selected based on the information gain. A comprehensive performance study involving various datasets and evaluation scenarios is conducted, demonstrating the superior performance of our proposed color-aware DD compared to existing DD methods. The code is available at \url{https://github.com/KeViNYuAn0314/AutoPalette}.
