Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz
TL;DR
This work identifies Accumulated Mismatching Problem (AMP) as a key drawback of fixed-length, long-range dataset distillation and introduces Automatic Training Trajectories (ATT) to adaptively choose trajectory lengths by minimizing the distance between synthetic and expert targets at all candidate steps. By selecting the optimal step $N_{opt}$ with a minimum distance $e_t = ||\theta'_{i,t}-\theta^*_{i,N_T}||^2$, ATT eliminates the accumulation of matching errors and improves generalization to unseen architectures. Empirical results across CIFAR-10/100, Tiny ImageNet, and ImageNet subsets show ATT outperforms prior baselines, especially in cross-architecture generalization, while remaining competitive in storage and computation relative to existing long-range methods. The approach yields stronger CA metrics and more stable performance under parameter variations, signaling a practical advance for efficient, robust synthetic-data distillation. $N_S$, $N_T$, AMP, and $e_t$ are central to the method, with ATT offering a dynamic alternative to the traditional fixed trajectory length scheme.$
Abstract
Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes. Some leading methods in this domain prioritize long-range matching, involving the unrolling of training trajectories with a fixed number of steps (NS) on the synthetic dataset to align with various expert training trajectories. However, traditional long-range matching methods possess an overfitting-like problem, the fixed step size NS forces synthetic dataset to distortedly conform seen expert training trajectories, resulting in a loss of generality-especially to those from unencountered architecture. We refer to this as the Accumulated Mismatching Problem (AMP), and propose a new approach, Automatic Training Trajectories (ATT), which dynamically and adaptively adjusts trajectory length NS to address the AMP. Our method outperforms existing methods particularly in tests involving cross-architectures. Moreover, owing to its adaptive nature, it exhibits enhanced stability in the face of parameter variations.
