Table of Contents
Fetching ...

TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

Fengli Ran, Xiao Pu, Bo Liu, Xiuli Bi, Bin Xiao

TL;DR

TGDD tackles the efficiency gap in dataset distillation by reframing distribution matching as a trajectory-guided, stage-aware process. It uses precomputed expert trajectories to perform stage-wise feature distribution matching and a distribution constraint to improve inter-class separability, balancing diversity and representativeness. Empirical results across ten datasets show state-of-the-art accuracy and strong cross-architecture generalization with low overhead, including a 5% gain on high-resolution ImageNet subsets. The approach offers a practical path to compact yet expressive synthetic data for scalable learning.

Abstract

Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model's training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.

TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

TL;DR

TGDD tackles the efficiency gap in dataset distillation by reframing distribution matching as a trajectory-guided, stage-aware process. It uses precomputed expert trajectories to perform stage-wise feature distribution matching and a distribution constraint to improve inter-class separability, balancing diversity and representativeness. Empirical results across ten datasets show state-of-the-art accuracy and strong cross-architecture generalization with low overhead, including a 5% gain on high-resolution ImageNet subsets. The approach offers a practical path to compact yet expressive synthetic data for scalable learning.

Abstract

Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model's training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.

Paper Structure

This paper contains 31 sections, 6 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Accuracy, distillation time, and GPU memory comparison on CIFAR-10 under different IPCs. Pretraining time is included (5 trajectories for ours, 100 for FTD). Our method balances performance and cost effectively.
  • Figure 2: t-SNE visualization of synthetic features generated by DM and our method under IPC-50, using pretrained models on CIFAR-10 at different training stages.
  • Figure 3: The illustration of our proposed method. First, we pretrain N expert trajectories with original dataset, each comprising M network snapshots. Then, one snapshot is randomly sampled as encoder for distribution matching between original and synthetic dataset, another snapshot in the expert region is chosen to impose distribution constraints on the synthetic dataset.
  • Figure 4: Effectiveness study of our method. (a) Distribution similarity via feature correlations. (b) Class separability evaluated by pretrained model accuracy across stages. (c) Information density from per-image and dataset-level neuron activation ratios.
  • Figure 5: Distribution of original images and synthetic images on CIFAR-10 with IPC-50.
  • ...and 3 more figures