Table of Contents
Fetching ...

Importance-Aware Adaptive Dataset Distillation

Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

TL;DR

Dataset distillation seeks to compress large datasets into a small informative set while preserving performance. The authors introduce Importance-Aware Adaptive Dataset Distillation (IADD), which assigns self-adaptive weights to network parameters and minimizes the IADD loss $\mathcal{L}=\frac{|| \tilde{\theta}'_{i,J}-\theta'_{i+K} ||^{2}_{2}}{|| \theta_{i}-\theta_{i+K} ||^{2}_{2}}$ to align student and teacher parameters across iterative updates. The method uses multiple teacher snapshots from the original data, jointly optimizes the learning rate $\alpha$, weights $\mathcal{W}$, and the distilled dataset $\mathcal{D}_{distill}$, and yields an optimized distilled set $\mathcal{D}_{distill}^*$. Empirically, IADD outperforms state-of-the-art parameter-matching methods on CIFAR-10/100 and Tiny ImageNet, demonstrates improved cross-architecture generalization, and proves effective in a real-world COVID-19 chest X-ray task, highlighting its potential for privacy-preserving, data-efficient learning.

Abstract

Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.

Importance-Aware Adaptive Dataset Distillation

TL;DR

Dataset distillation seeks to compress large datasets into a small informative set while preserving performance. The authors introduce Importance-Aware Adaptive Dataset Distillation (IADD), which assigns self-adaptive weights to network parameters and minimizes the IADD loss to align student and teacher parameters across iterative updates. The method uses multiple teacher snapshots from the original data, jointly optimizes the learning rate , weights , and the distilled dataset , and yields an optimized distilled set . Empirically, IADD outperforms state-of-the-art parameter-matching methods on CIFAR-10/100 and Tiny ImageNet, demonstrates improved cross-architecture generalization, and proves effective in a real-world COVID-19 chest X-ray task, highlighting its potential for privacy-preserving, data-efficient learning.

Abstract

Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.
Paper Structure (18 sections, 13 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 13 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Visualization results of the numerical difference between parameters of teacher and student networks in the corresponding dimensions (A three-depth ConvNet on CIFAR-10).
  • Figure 2: Concept of the proposed method. IADD aims to train the student network parameters by leveraging a distilled dataset that aligns with the teacher network parameters derived from the original large dataset. Because the parameter pairs in the teacher and student networks are different, we deal with these parameters using different importance weights.
  • Figure 3: Overview of the proposed method. The main objective is to match the student network parameters $\tilde{\theta}'_{i,J}$ with the teacher network parameters $\theta'_{i+K}$ using the IADD loss $\mathcal{L}$. $\tilde{\theta}_{i,J}$ and $\theta_{i+K}$ denote intermediate model parameters without the processing of self-adaptive weights $\mathcal{W}$. $i$ represents the random start timestamp of the teacher and student parameters. $J$ and $K$ represent gradient descent updates of teacher and student parameters, respectively.
  • Figure 4: Distilled CIFAR-10 dataset with IPC = 1 and 10.
  • Figure 5: Distilled CIFAR-100 and Tiny ImageNet datasets (selected images) with IPC = 1.
  • ...and 3 more figures