Understanding Dataset Distillation via Spectral Filtering
Deyu Bo, Songhua Liu, Xinchao Wang
TL;DR
This work proposes UniDD, a spectral-filtering framework that unifies diverse dataset distillation objectives by treating each method as a filter on the feature-feature correlation matrix $X^{\top}X$ and the feature-label correlation matrix $X^{\top}Y$. It reveals a spectrum of low-frequency (global texture) and high-frequency (local detail) matching strategies, and introduces Curriculum Frequency Matching (CFM) to seamlessly integrate both frequency bands by dynamically varying the high-pass parameter $\beta_t$. The method computes per-layer FFC/FLC representations with stabilized statistics, uses exponential moving updates to align real and synthetic data, and optimizes a combined loss that includes classification, filter, and signal terms. Experiments across CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K show that CFM consistently improves performance and cross-architecture generalization over strong baselines, underscoring the practicality and robustness of the UniDD perspective for advancing dataset distillation.
Abstract
Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.
