Table of Contents
Fetching ...

Understanding Dataset Distillation via Spectral Filtering

Deyu Bo, Songhua Liu, Xinchao Wang

TL;DR

This work proposes UniDD, a spectral-filtering framework that unifies diverse dataset distillation objectives by treating each method as a filter on the feature-feature correlation matrix $X^{\top}X$ and the feature-label correlation matrix $X^{\top}Y$. It reveals a spectrum of low-frequency (global texture) and high-frequency (local detail) matching strategies, and introduces Curriculum Frequency Matching (CFM) to seamlessly integrate both frequency bands by dynamically varying the high-pass parameter $\beta_t$. The method computes per-layer FFC/FLC representations with stabilized statistics, uses exponential moving updates to align real and synthetic data, and optimizes a combined loss that includes classification, filter, and signal terms. Experiments across CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K show that CFM consistently improves performance and cross-architecture generalization over strong baselines, underscoring the practicality and robustness of the UniDD perspective for advancing dataset distillation.

Abstract

Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.

Understanding Dataset Distillation via Spectral Filtering

TL;DR

This work proposes UniDD, a spectral-filtering framework that unifies diverse dataset distillation objectives by treating each method as a filter on the feature-feature correlation matrix and the feature-label correlation matrix . It reveals a spectrum of low-frequency (global texture) and high-frequency (local detail) matching strategies, and introduces Curriculum Frequency Matching (CFM) to seamlessly integrate both frequency bands by dynamically varying the high-pass parameter . The method computes per-layer FFC/FLC representations with stabilized statistics, uses exponential moving updates to align real and synthetic data, and optimizes a combined loss that includes classification, filter, and signal terms. Experiments across CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K show that CFM consistently improves performance and cross-architecture generalization over strong baselines, underscoring the practicality and robustness of the UniDD perspective for advancing dataset distillation.

Abstract

Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.

Paper Structure

This paper contains 37 sections, 28 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visualization of different filtering functions. A larger eigenvalue indicates a lower frequency. If $f(\lambda)$ has a large value when $\lambda$ is small, then $f(\cdot)$ is a high-pass filter, and vice versa.
  • Figure 2: Synthetic images distilled by different filters.
  • Figure 3: Ablation studies on the choice of filters.
  • Figure 4: Visualization of the images synthesized by different DD methods.
  • Figure 5: Synthetic images of ImageNet-1k. From left to right, the frequency gradually increases.