Table of Contents
Fetching ...

Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation

Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, Shu-Tao Xia

TL;DR

This work introduces Hierarchical Parameterization Distillation (H-PD), a GAN-based dataset distillation framework that progressively explores hierarchical feature domains within a pretrained generator to optimize synthetic data. By moving beyond fixed feature spaces and leveraging a class-relevant CAM-based distance for implicit evaluation, H-PD improves cross-architecture and high-compression performance (IPC) relative to GLaD and diffusion-based methods. The approach combines ensemble-averaged latent initialization with hierarchical layer traversal, yielding faster convergence and better preservation of salient class information in synthetic samples. Extensive experiments across CIFAR-10, Tiny-ImageNet, and ImageNet-Subset demonstrate consistent, significant gains and robust ablations highlight the contributions of unfixed optimization spaces and the distance-based search strategy. Overall, H-PD repositions the optimization space as a dynamic, multi-layered prior, unlocking richer guidance for parameterization distillation under extreme compression.

Abstract

Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current parameterization methods achieve enhanced performance under extremely high compression ratio by optimizing determined synthetic dataset in informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Parameterization Distillation (H-PD), to systematically explore hierarchical feature within provided feature space (e.g., layers within pre-trained generative adversarial networks). We verify the correctness of our insights by applying the hierarchical optimization strategy on GAN-based parameterization method. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-PD achieves a significant performance improvement under various settings with equivalent time consumption, and even surpasses current generative distillation using diffusion models under extreme compression ratios IPC=1 and IPC=10.

Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation

TL;DR

This work introduces Hierarchical Parameterization Distillation (H-PD), a GAN-based dataset distillation framework that progressively explores hierarchical feature domains within a pretrained generator to optimize synthetic data. By moving beyond fixed feature spaces and leveraging a class-relevant CAM-based distance for implicit evaluation, H-PD improves cross-architecture and high-compression performance (IPC) relative to GLaD and diffusion-based methods. The approach combines ensemble-averaged latent initialization with hierarchical layer traversal, yielding faster convergence and better preservation of salient class information in synthetic samples. Extensive experiments across CIFAR-10, Tiny-ImageNet, and ImageNet-Subset demonstrate consistent, significant gains and robust ablations highlight the contributions of unfixed optimization spaces and the distance-based search strategy. Overall, H-PD repositions the optimization space as a dynamic, multi-layered prior, unlocking richer guidance for parameterization distillation under extreme compression.

Abstract

Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current parameterization methods achieve enhanced performance under extremely high compression ratio by optimizing determined synthetic dataset in informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Parameterization Distillation (H-PD), to systematically explore hierarchical feature within provided feature space (e.g., layers within pre-trained generative adversarial networks). We verify the correctness of our insights by applying the hierarchical optimization strategy on GAN-based parameterization method. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-PD achieves a significant performance improvement under various settings with equivalent time consumption, and even surpasses current generative distillation using diffusion models under extreme compression ratios IPC=1 and IPC=10.
Paper Structure (45 sections, 10 equations, 12 figures, 21 tables, 1 algorithm)

This paper contains 45 sections, 10 equations, 12 figures, 21 tables, 1 algorithm.

Figures (12)

  • Figure 1: Performance of synthetic datasets condensed from various feature domains provided by GAN under the same settings (DSA on ImageNet-Birds).
  • Figure 2: The comparison between fixed optimization space and unfixed optimization space. $\mathcal{S}^{i}$ is the synthetic dataset at optimization steps $i$, $\mathcal{S}^{*}$ is the optimal synthetic dataset selected during the optimization path, $\mathcal{S}_{j}$ is the synthetic dataset optimized in feature domain $j$.
  • Figure 3: Visualization comparison of the synthetic datasets with different distillation methods.
  • Figure 4: The comparison of performance(%) at the same optimization epoch.
  • Figure 5: The relationship between searching basis and performance. Note that higher loss-norm values indicate lower loss values and the same applies to feature distances.
  • ...and 7 more figures