Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan
TL;DR
This work introduces long-tailed dataset distillation (LTDD), addressing the breakdown of standard dataset distillation when the target data are highly imbalanced. It identifies two root causes: biased gradients from distilling imbalanced data and suboptimal tail-class guidance from biased experts. To overcome this, the authors propose Distribution-agnostic Matching (DAM) to align gradient distributions without propagating weight imbalances, and Expert Decoupling (ED) to jointly and separately optimize representation and classification pathways, using reliable soft-label initialization. Evaluations on CIFAR-10-LT, CIFAR-100-LT, TinyImageNet-LT, and ImageNet-LT show state-of-the-art results, including lossless performance in some settings and strong cross-architecture generalization, marking the first effective LTDD method with robust tail-class performance and practical implications for training efficiency on real-world imbalanced data.
Abstract
Dataset distillation aims to synthesize a small, information-rich dataset from a large one for efficient model training. However, existing dataset distillation methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) The distillation process on imbalanced datasets develops biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we first propose Distribution-agnostic Matching to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we improve the distillation guidance with Expert Decoupling, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation, marking the first effective effort to distill long-tailed datasets.
