Condensed Data Expansion Using Model Inversion for Knowledge Distillation
Kuluhan Binici, Shivam Aggarwal, Cihan Acar, Nam Trung Pham, Karianto Leman, Gim Hee Lee, Tulika Mitra
TL;DR
The paper tackles the limited information content of condensed datasets for knowledge distillation by proposing a condensed data expansion approach guided by model inversion. It introduces a feature-alignment discriminator that conditions synthetic data on condensed prototypes, enabling synthetic samples to closely reflect the underlying data distribution and reduce domain gaps. Empirical results across CIFAR-10/100 and ImageNet-200 show consistent KD improvements, with gains up to around 11.4 percentage points, and effectiveness even with minimal per-class condensed samples or real-data in few-shot settings. The method is compatible with existing MI techniques and strengthens KD in heterogeneous model pairs, privacy-preserving contexts, and data-scarce regimes, offering practical benefits for compressed-data KD pipelines.
Abstract
Condensed datasets offer a compact representation of larger datasets, but training models directly on them or using them to enhance model performance through knowledge distillation (KD) can result in suboptimal outcomes due to limited information. To address this, we propose a method that expands condensed datasets using model inversion, a technique for generating synthetic data based on the impressions of a pre-trained model on its training data. This approach is particularly well-suited for KD scenarios, as the teacher model is already pre-trained and retains knowledge of the original training data. By creating synthetic data that complements the condensed samples, we enrich the training set and better approximate the underlying data distribution, leading to improvements in student model accuracy during knowledge distillation. Our method demonstrates significant gains in KD accuracy compared to using condensed datasets alone and outperforms standard model inversion-based KD methods by up to 11.4% across various datasets and model architectures. Importantly, it remains effective even when using as few as one condensed sample per class, and can also enhance performance in few-shot scenarios where only limited real data samples are available.
