Table of Contents
Fetching ...

MetaDD: Boosting Dataset Distillation with Neural Network Architecture-Invariant Generalization

Yunlong Zhao, Xiaoheng Deng, Xiu Su, Hongyan Xu, Xiuxing Li, Yijing Liu, Shan You

TL;DR

MetaDD employs an architecture-invariant loss function for multi-architecture feature alignment, which increases meta features and reduces heterogeneous features in distilled data, and can be seamlessly integrated into any DD methodology.

Abstract

Dataset distillation (DD) entails creating a refined, compact distilled dataset from a large-scale dataset to facilitate efficient training. A significant challenge in DD is the dependency between the distilled dataset and the neural network (NN) architecture used. Training a different NN architecture with a distilled dataset distilled using a specific architecture often results in diminished trainning performance for other architectures. This paper introduces MetaDD, designed to enhance the generalizability of DD across various NN architectures. Specifically, MetaDD partitions distilled data into meta features (i.e., the data's common characteristics that remain consistent across different NN architectures) and heterogeneous features (i.e., the data's unique feature to each NN architecture). Then, MetaDD employs an architecture-invariant loss function for multi-architecture feature alignment, which increases meta features and reduces heterogeneous features in distilled data. As a low-memory consumption component, MetaDD can be seamlessly integrated into any DD methodology. Experimental results demonstrate that MetaDD significantly improves performance across various DD methods. On the Distilled Tiny-Imagenet with Sre2L (50 IPC), MetaDD achieves cross-architecture NN accuracy of up to 30.1\%, surpassing the second-best method (GLaD) by 1.7\%.

MetaDD: Boosting Dataset Distillation with Neural Network Architecture-Invariant Generalization

TL;DR

MetaDD employs an architecture-invariant loss function for multi-architecture feature alignment, which increases meta features and reduces heterogeneous features in distilled data, and can be seamlessly integrated into any DD methodology.

Abstract

Dataset distillation (DD) entails creating a refined, compact distilled dataset from a large-scale dataset to facilitate efficient training. A significant challenge in DD is the dependency between the distilled dataset and the neural network (NN) architecture used. Training a different NN architecture with a distilled dataset distilled using a specific architecture often results in diminished trainning performance for other architectures. This paper introduces MetaDD, designed to enhance the generalizability of DD across various NN architectures. Specifically, MetaDD partitions distilled data into meta features (i.e., the data's common characteristics that remain consistent across different NN architectures) and heterogeneous features (i.e., the data's unique feature to each NN architecture). Then, MetaDD employs an architecture-invariant loss function for multi-architecture feature alignment, which increases meta features and reduces heterogeneous features in distilled data. As a low-memory consumption component, MetaDD can be seamlessly integrated into any DD methodology. Experimental results demonstrate that MetaDD significantly improves performance across various DD methods. On the Distilled Tiny-Imagenet with Sre2L (50 IPC), MetaDD achieves cross-architecture NN accuracy of up to 30.1\%, surpassing the second-best method (GLaD) by 1.7\%.
Paper Structure (14 sections, 12 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Example distilled images from 128x128 ImageNet (Row 1 and 2) using MTT mtt and 64x64 Tiny ImageNet (Row 3 and 4) using DC dc. Row 2 and 4 are updated distilled image by our method, which substantially improves cross-architecture dataset distillation utility with minimal additional memory and training time.
  • Figure 2: Meta and heterogeneous features based on CAM. DM dm and DC dc are used for distilling Tiny-ImageNet, while MTT mtt and Sre2L sre2L are used for distilling ISRL2012. One image's meta features involve the overlapping regions of different NN CAMs while heterogeneous features are the portions of different CAMs that remain after removing meta features. The synthetic images of every DD method are from the same class as the Initial images. The distillation structure uses ResNet18, while the cross-structures are GoogLeNet and AlexNet, respectively.
  • Figure 3: The framework of MetaDD. Our method is designed to supervise the synthesis of data during training to ensure it exhibits low-variance CAMs across multiple pre-trained NNs.
  • Figure 4: ViT's validation accuracy on different erased TinyImagenet. The numbers represent the difference in accuracy between the erased and the original dataset.
  • Figure 5: In each subplot, the first row displays images generated by the original DD algorithm, while the second row presents images generated after integrating MetaCAM.
  • ...and 1 more figures