Table of Contents
Fetching ...

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou

TL;DR

The paper tackles the inefficiencies of class-specific dataset distillation by addressing feature duplication and inter-class feature oversight. It introduces INFER, leveraging a Universal Feature Compensator to produce multiple synthetic instances per input and to integrate inter-class information, while enabling linear label interpolation with static soft labels to drastically reduce storage. Through extensive experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1k, INFER achieves state-of-the-art or competitive results at very low compression, demonstrating improved efficiency and generalization over prior methods such as SRe2L and G-VBSM. Overall, INFER offers a scalable, cost-effective paradigm that enhances distillation budgets and decision boundary clarity, with strong practical impact for large-scale dataset condensation.

Abstract

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation. In practice, INFER demonstrates state-of-the-art performance across benchmark datasets. For instance, in the ipc = 50 setting on ImageNet-1k with the same compression level, it outperforms SRe2L by 34.5% using ResNet18.

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

TL;DR

The paper tackles the inefficiencies of class-specific dataset distillation by addressing feature duplication and inter-class feature oversight. It introduces INFER, leveraging a Universal Feature Compensator to produce multiple synthetic instances per input and to integrate inter-class information, while enabling linear label interpolation with static soft labels to drastically reduce storage. Through extensive experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1k, INFER achieves state-of-the-art or competitive results at very low compression, demonstrating improved efficiency and generalization over prior methods such as SRe2L and G-VBSM. Overall, INFER offers a scalable, cost-effective paradigm that enhances distillation budgets and decision boundary clarity, with strong practical impact for large-scale dataset condensation.

Abstract

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation. In practice, INFER demonstrates state-of-the-art performance across benchmark datasets. For instance, in the ipc = 50 setting on ImageNet-1k with the same compression level, it outperforms SRe2L by 34.5% using ResNet18.
Paper Structure (19 sections, 5 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 19 sections, 5 equations, 9 figures, 6 tables, 2 algorithms.

Figures (9)

  • Figure 1: Performance vs. compression ratio of SOTA dataset distillation methods (G-VBSM shao2023generalized, SRe2L sre, MTT mtt) on three benchmarks. Performance is measured as the Top-1 accuracy of ResNet-18 (ConvNet128 for MTT) on the respective validation sets, trained from scratch using synthetic datasets. The compression ratio, including the additional soft labels, is the proportion of the distilled dataset size to the original dataset size. The star indicates optimal performance.
  • Figure 2: Left: Overview of dataset distillation paradigms. The first illustrates the traditional "one instance for one class" approach, where each instance is optimized exclusively for its pre-assigned label, creating implicit class barriers. The second illustrates our INFER method, designed for "one instance for ALL classes" distillation. Right: t-SNE visualization of the decision boundaries between the traditional approaches (i.e., SRe2L sre) and our INFER approach. We randomly select seven classes from CIFAR-100 dataset for the visualization. INFER forms thin and clear decision boundaries among classes, in contrast to the chaotic decision boundaries of the traditional approach.
  • Figure 3: Illustration of the integration process between Universal Feature Compensators (UFCs) and natural data instances as described in \ref{['eq:aug']}. The integration is performed through a simple addition process. Consequently, only the sets $\mathcal{S}=(\mathcal{P}^k,\mathcal{U}^k)$ need to be stored as the synthetic dataset. The synthetic dataset $\tilde{\mathcal{S}}^k$ is generated on-the-fly during training.
  • Figure 4: Left: The change in feature duplication with the increase of ipc. To measure the level of feature duplication, we employ the averaged cosine similarities between each pair of synthetic data instances within the same class. Therefore, a greater value represents higher feature duplication, as SRe2L sre shows. In contrast, our INFER obtains a lower feature duplication, which is closer to the level observed in natural datasets. Right: The ablation study of UFC. The first two groups are under the $\texttt{ipc}=10$ setting, while the other two are under $\texttt{ipc}=50$. The purple annotations indicate the performance gains contributed by our UFC.
  • Figure 5: Visualizations of loss landscapes in pixel space on CIFAR-100 dataset. The optimal decision boundary is supposed to have a rapid change in cross-entropy loss at the edge, indicating a clear and distinctive decision boundary. Left: A distinctive decision boundary trained on the original dataset ${\mathcal{T}}$. Middle: A less distinctive decision boundary trained on the synthetic dataset of outstanding class-specific approach SRe2L. Right: An improved decision boundary trained on the synthetic dataset of INFER.
  • ...and 4 more figures