Improve Cross-Architecture Generalization on Dataset Distillation

Binglin Zhou; Linhao Zhong; Wentao Chen

Improve Cross-Architecture Generalization on Dataset Distillation

Binglin Zhou, Linhao Zhong, Wentao Chen

TL;DR

The paper tackles the problem of cross-architecture generalization in dataset distillation by introducing a model pool that trains distilled data across multiple, related architectures, reducing architecture-specific biases. It further integrates knowledge distillation into the distilled-data workflow to enhance generalization when deploying distilled data on unseen models. Empirical results on CIFAR-10 show that the combination of model pool and knowledge distillation significantly improves cross-architecture performance over traditional gradient matching baselines. This approach offers a scalable way to produce more universally applicable distilled datasets with practical implications for faster, architecture-agnostic model training.

Abstract

Dataset distillation, a pragmatic approach in machine learning, aims to create a smaller synthetic dataset from a larger existing dataset. However, existing distillation methods primarily adopt a model-based paradigm, where the synthetic dataset inherits model-specific biases, limiting its generalizability to alternative models. In response to this constraint, we propose a novel methodology termed "model pool". This approach involves selecting models from a diverse model pool based on a specific probability distribution during the data distillation process. Additionally, we integrate our model pool with the established knowledge distillation approach and apply knowledge distillation to the test process of the distilled dataset. Our experimental results validate the effectiveness of the model pool approach across a range of existing models while testing, demonstrating superior performance compared to existing methodologies.

Improve Cross-Architecture Generalization on Dataset Distillation

TL;DR

Abstract

Paper Structure (17 sections, 2 equations, 1 figure, 5 tables)

This paper contains 17 sections, 2 equations, 1 figure, 5 tables.

Introduction
Related Work
Dataset Distillation
Model Generalization
Knowledge Distillation
Methods
Preliminary: Dataset Distillation Based On Gradient Matching
Model Pool
Knowledge Distillation
Experiments
Settings
Model Pool
Knowledge Distillation
Result Analysis
Conclusions
...and 2 more sections

Figures (1)

Figure 1: Overview of model pool \ref{['model_pool']} and knowledge distillation \ref{['knowledge distillation']} method. The model pool is a set of different models with different architectures. Every model in the model pool has its own probability to be chosen. Knowledge distillation is performed on the distilled dataset. The two methods are independent.

Improve Cross-Architecture Generalization on Dataset Distillation

TL;DR

Abstract

Improve Cross-Architecture Generalization on Dataset Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)