GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang, Peng Sun, Tao Lin
TL;DR
GIFT addresses the sensitivity of dataset distillation to loss functions when using soft labels by introducing a universal, plug-and-play approach that refines labels and adopts a cosine similarity loss. The method is underpinned by a mutual-information bound and InfoNCE-based reasoning, plus hard-label smoothing to bolster inter-class dispersion. Empirically, GIFT consistently improves state-of-the-art DD methods across Tiny-ImageNet, ImageNet-1K, and large networks, while incurring near-zero additional cost and enhancing cross-architecture and cross-optimizer generalization. The work delivers a practical, scalable solution with broad impact for continual learning and large-scale distillation, and provides theoretical and empirical support for cosine-based label utilization.
Abstract
Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models. In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. We first conduct a comprehensive comparison of various loss functions for soft label utilization in dataset distillation, revealing that the model trained on the synthetic dataset exhibits high sensitivity to the choice of loss function for soft label utilization. This finding highlights the necessity of a universal loss function for training models on synthetic datasets. Building on these insights, we introduce an extremely simple yet surprisingly effective plug-and-play approach, GIFT, which encompasses soft label refinement and a cosine similarity-based loss function to efficiently leverage full label information. Extensive experiments indicate that GIFT consistently enhances state-of-the-art dataset distillation methods across various dataset scales, without incurring additional computational costs. Importantly, GIFT significantly enhances cross-optimizer generalization, an area previously overlooked. For instance, on ImageNet-1K with IPC = 10, GIFT enhances the state-of-the-art method RDED by 30.8% in cross-optimizer generalization. Our code is available at https://github.com/LINs-lab/GIFT.
