Table of Contents
Fetching ...

HIERAMP: Coarse-to-Fine Autoregressive Amplification for Generative Dataset Distillation

Lin Zhao, Xinru Jiang, Xi Xiao, Qihui Fan, Lei Lu, Yanzhi Wang, Xue Lin, Octavia Camps, Pu Zhao, Jianyang Gu

TL;DR

Across popular dataset distillation benchmarks, HIERAMP consistently improves validation performance without explicitly optimizing global proximity, demonstrating the importance of semantic amplification for effective dataset distillation.

Abstract

Dataset distillation often prioritizes global semantic proximity when creating small surrogate datasets for original large-scale ones. However, object semantics are inherently hierarchical. For example, the position and appearance of a bird's eyes are constrained by the outline of its head. Global proximity alone fails to capture how object-relevant structures at different levels support recognition. In this work, we investigate the contributions of hierarchical semantics to effective distilled data. We leverage the vision autoregressive (VAR) model whose coarse-to-fine generation mirrors this hierarchy and propose HIERAMP to amplify semantics at different levels. At each VAR scale, we inject class tokens that dynamically identify salient regions and use their induced maps to guide amplification at that scale. This adds only marginal inference cost while steering synthesis toward discriminative parts and structures. Empirically, we find that semantic amplification leads to more diverse token choices in constructing coarse-scale object layouts. Conversely, at fine scales, the amplification concentrates token usage, increasing focus on object-related details. Across popular dataset distillation benchmarks, HIERAMP consistently improves validation performance without explicitly optimizing global proximity, demonstrating the importance of semantic amplification for effective dataset distillation.

HIERAMP: Coarse-to-Fine Autoregressive Amplification for Generative Dataset Distillation

TL;DR

Across popular dataset distillation benchmarks, HIERAMP consistently improves validation performance without explicitly optimizing global proximity, demonstrating the importance of semantic amplification for effective dataset distillation.

Abstract

Dataset distillation often prioritizes global semantic proximity when creating small surrogate datasets for original large-scale ones. However, object semantics are inherently hierarchical. For example, the position and appearance of a bird's eyes are constrained by the outline of its head. Global proximity alone fails to capture how object-relevant structures at different levels support recognition. In this work, we investigate the contributions of hierarchical semantics to effective distilled data. We leverage the vision autoregressive (VAR) model whose coarse-to-fine generation mirrors this hierarchy and propose HIERAMP to amplify semantics at different levels. At each VAR scale, we inject class tokens that dynamically identify salient regions and use their induced maps to guide amplification at that scale. This adds only marginal inference cost while steering synthesis toward discriminative parts and structures. Empirically, we find that semantic amplification leads to more diverse token choices in constructing coarse-scale object layouts. Conversely, at fine scales, the amplification concentrates token usage, increasing focus on object-related details. Across popular dataset distillation benchmarks, HIERAMP consistently improves validation performance without explicitly optimizing global proximity, demonstrating the importance of semantic amplification for effective dataset distillation.
Paper Structure (29 sections, 14 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 29 sections, 14 equations, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of the HierAmp framework.Left: Scale-Restricted Class Token Attention Mask. The class token attends only to image tokens from the corresponding scale, with grey regions indicating blocked attention, producing a scale-specific semantic summary. Right: Multi-Scale Semantic Feature Amplification. The Amplify Algorithm selects the top attention positions from the class-token map at each scale and amplifies them via a positive logit bias, guiding the model to focus on semantically important features during decoding.
  • Figure 2: Impact of attention amplification strategy on token entropy and coverage on ImageNet-1K, IPC$=$50. The histogram shows the percentage of classes whose codebook token entropy and coverage increased, decreased, or remained unchanged after amplifying different stages. Amplifying attention at coarse and mid scales promotes diversity, while fine-scale amplification can concentrate attention.
  • Figure 3: Heatmap of unique codebook token occurrences across patch positions and scales on ImageNet-1K, Class 51 – Triceratops, IPC$=$50. Darker patches indicate a higher number of unique tokens. The average of unique-token count for each scale is displayed in the upper-right corner of each heatmap.
  • Figure 4: Example generated images and attention heatmaps of the class token. Our method produces richer object details and quantities, achieves stronger semantic alignment, and enhances object-background dependence.
  • Figure 5: Qualitative comparison of the DiT backbone before and after applying HIERAMP.
  • ...and 5 more figures