Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Lirui Zhao; Yuxin Zhang; Fei Chao; Rongrong Ji

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Lirui Zhao, Yuxin Zhang, Fei Chao, Rongrong Ji

TL;DR

By performing extensive experiments, it is proved that ELF can well enhance the cross-architecture generalization of current DD methods, and learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance.

Abstract

The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

TL;DR

Abstract

Paper Structure (14 sections, 9 equations, 5 figures, 12 tables)

This paper contains 14 sections, 9 equations, 5 figures, 12 tables.

Introduction
Related work
Methodology
Background
An Empirical Study of Cross-Architecture Generalization
Evaluation with the Distillation Feature
Experimentation
Experimental Details
Quantitative Comparison
Performance Analysis
Limitation
Conclusion
Experimental Details
More Performance Analysis

Figures (5)

Figure 1: Improvement of the proposed ELF over existing baseline methods including (a) DSA and (b) MTT. Here, the distillation model is Conv-IN with width of 128 and depth of 3. "IN" and "BN" denote instance normalization and batch normalization. The horizontal coordinates show the different evaluation models. "Alignment" stands for the same evaluation model with the distillation model. Experiments are performed on CIFAR-100 dataset with 10 images per class (IPC).
Figure 2: Framework of our proposed ELF method. We feed the synthetic dataset to the distillation model and obtain the bias-free intermediate features, which are then used to guide the training process of the evaluation model.
Figure 3: Impacts of different $\lambda_{front}$ and $\lambda_{rear}$ on the test accuracy. CIFAR-100 10 IPC evaluated on ResNet18-IN.
Figure 4: Ablation studies on feature epoch in ELF. The horizontal coordinate denotes the number of epochs learned by the distillation model that generated the required features.
Figure : Performances of using different features on different baseline networks. ConvNet denotes ConvNetW512-IN and ResNet denotes ResNet18-IN in here. CIFAR-100 1/10 IPC, using ZCA preprocessing during distillation. Our default setting is highlighted in gray.

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

TL;DR

Abstract

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Authors

TL;DR

Abstract

Table of Contents

Figures (5)