Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, Shu-Tao Xia
TL;DR
This work introduces Hierarchical Parameterization Distillation (H-PD), a GAN-based dataset distillation framework that progressively explores hierarchical feature domains within a pretrained generator to optimize synthetic data. By moving beyond fixed feature spaces and leveraging a class-relevant CAM-based distance for implicit evaluation, H-PD improves cross-architecture and high-compression performance (IPC) relative to GLaD and diffusion-based methods. The approach combines ensemble-averaged latent initialization with hierarchical layer traversal, yielding faster convergence and better preservation of salient class information in synthetic samples. Extensive experiments across CIFAR-10, Tiny-ImageNet, and ImageNet-Subset demonstrate consistent, significant gains and robust ablations highlight the contributions of unfixed optimization spaces and the distance-based search strategy. Overall, H-PD repositions the optimization space as a dynamic, multi-layered prior, unlocking richer guidance for parameterization distillation under extreme compression.
Abstract
Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current parameterization methods achieve enhanced performance under extremely high compression ratio by optimizing determined synthetic dataset in informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Parameterization Distillation (H-PD), to systematically explore hierarchical feature within provided feature space (e.g., layers within pre-trained generative adversarial networks). We verify the correctness of our insights by applying the hierarchical optimization strategy on GAN-based parameterization method. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-PD achieves a significant performance improvement under various settings with equivalent time consumption, and even surpasses current generative distillation using diffusion models under extreme compression ratios IPC=1 and IPC=10.
