Table of Contents
Fetching ...

Color-Oriented Redundancy Reduction in Dataset Distillation

Bowen Yuan, Zijian Wang, Mahsa Baktashmotlagh, Yadan Luo, Zi Huang

TL;DR

This work tackles the inefficiency of large-scale image datasets by targeting color-space redundancy in Dataset Distillation (DD). It introduces AutoPalette, a two-component framework: a palette network that condenses each image from 256 colors to a reduced palette of size $K$, and a color-guided initialization that selects diverse samples using submodular information gain with a graph-cut objective, both optimized under the standard distillation loss with added palette losses. Key contributions include the palette-specific losses $\\mathcal{L}_{m}$, $\\mathcal{L}_{b}$, $\\mathcal{L}_{a}$, a storage-analysis that quantifies color-depth implications, and extensive experiments showing competitive or superior performance on CIFAR-10/100 and ImageNet subsets under tight storage budgets. The approach preserves essential discriminative features while enabling memory-efficient distillation, and it is compatible with multiple DD frameworks, enabling broader impact for efficient model training with reduced data footprints. All mathematical expressions are presented with proper $...$ delimiters to ensure clarity and reproducibility, such as $K$, $|\\mathcal{S}|$, and $\\mathcal{L}_{task}$.

Abstract

Dataset Distillation (DD) is designed to generate condensed representations of extensive image datasets, enhancing training efficiency. Despite recent advances, there remains considerable potential for improvement, particularly in addressing the notable redundancy within the color space of distilled images. In this paper, we propose AutoPalette, a framework that minimizes color redundancy at the individual image and overall dataset levels, respectively. At the image level, we employ a palette network, a specialized neural network, to dynamically allocate colors from a reduced color space to each pixel. The palette network identifies essential areas in synthetic images for model training and consequently assigns more unique colors to them. At the dataset level, we develop a color-guided initialization strategy to minimize redundancy among images. Representative images with the least replicated color patterns are selected based on the information gain. A comprehensive performance study involving various datasets and evaluation scenarios is conducted, demonstrating the superior performance of our proposed color-aware DD compared to existing DD methods. The code is available at \url{https://github.com/KeViNYuAn0314/AutoPalette}.

Color-Oriented Redundancy Reduction in Dataset Distillation

TL;DR

This work tackles the inefficiency of large-scale image datasets by targeting color-space redundancy in Dataset Distillation (DD). It introduces AutoPalette, a two-component framework: a palette network that condenses each image from 256 colors to a reduced palette of size , and a color-guided initialization that selects diverse samples using submodular information gain with a graph-cut objective, both optimized under the standard distillation loss with added palette losses. Key contributions include the palette-specific losses , , , a storage-analysis that quantifies color-depth implications, and extensive experiments showing competitive or superior performance on CIFAR-10/100 and ImageNet subsets under tight storage budgets. The approach preserves essential discriminative features while enabling memory-efficient distillation, and it is compatible with multiple DD frameworks, enabling broader impact for efficient model training with reduced data footprints. All mathematical expressions are presented with proper delimiters to ensure clarity and reproducibility, such as , , and .

Abstract

Dataset Distillation (DD) is designed to generate condensed representations of extensive image datasets, enhancing training efficiency. Despite recent advances, there remains considerable potential for improvement, particularly in addressing the notable redundancy within the color space of distilled images. In this paper, we propose AutoPalette, a framework that minimizes color redundancy at the individual image and overall dataset levels, respectively. At the image level, we employ a palette network, a specialized neural network, to dynamically allocate colors from a reduced color space to each pixel. The palette network identifies essential areas in synthetic images for model training and consequently assigns more unique colors to them. At the dataset level, we develop a color-guided initialization strategy to minimize redundancy among images. Representative images with the least replicated color patterns are selected based on the information gain. A comprehensive performance study involving various datasets and evaluation scenarios is conducted, demonstrating the superior performance of our proposed color-aware DD compared to existing DD methods. The code is available at \url{https://github.com/KeViNYuAn0314/AutoPalette}.

Paper Structure

This paper contains 25 sections, 20 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: The overview of the proposed AutoPalette framework. Initialization: We compare the information gain of quantized images to select the images used in the initialization stage. Training: We forward the synthetic data to the palette network to obtain the color-reduced images. The objective functions of palette network include $\mathcal{L}_a$, $\mathcal{L}_b$, $\mathcal{L}_m$ and $\mathcal{L}_{task}$. The synthetic dataset is updated by solely optimizes $\mathcal{L}_{task}$ .
  • Figure 2: The visualization of (a) images under 8, 6, 3, 1-bit color depths (b-c) color condensed synthetic images and their color palette. (b) our full model (c) our full model without palette loss. The larger difference among rows of a color palette indicates better color utilization.
  • Figure 3: Comparison between the performance of submodular color diversity initialization and random real images initialization.
  • Figure 4: CIFAR10 color condensed synthetic images with ZCA whitening.
  • Figure 5: CIFAR10 color condensed synthetic images without ZCA whitening.
  • ...and 7 more figures