Decomposed Distribution Matching in Dataset Condensation
Sahar Rahimi Malakshan, Mohammad Saeed Ebrahimi Saadabadi, Ali Dabouei, Nasser M. Nasrabadi
TL;DR
This paper tackles the efficiency–performance gap in Dataset Condensation by decomposing distribution matching into content and style. It introduces Style Matching (MM and CM) to align style via first/second feature-map moments and cross-map correlations, and Intra-Class Diversity (ICD) using KL-divergence with a kNN constraint to diversify condensed samples. The condensed dataset is learned by minimizing a joint objective that combines style and content terms, $L_S = \alpha L_{MM} + L_{CM}$ and $L_C = \beta L_{ICD} + L_{MMD}$, with the overall optimization $\mathcal{S}^* = \arg\min (\lambda L_S + L_C)$. Across CIFAR10/100, TinyImageNet, ImageNet-1K subsets, and high-resolution datasets, the method yields consistent improvements over DM, scales to multiple architectures, and extends to continual learning, while keeping computational efficiency. The work provides a practical framework to produce diverse, style-aligned condensed data suitable for large-scale training and continual learning applications.
Abstract
Dataset Condensation (DC) aims to reduce deep neural networks training efforts by synthesizing a small dataset such that it will be as effective as the original large dataset. Conventionally, DC relies on a costly bi-level optimization which prohibits its practicality. Recent research formulates DC as a distribution matching problem which circumvents the costly bi-level optimization. However, this efficiency sacrifices the DC performance. To investigate this performance degradation, we decomposed the dataset distribution into content and style. Our observations indicate two major shortcomings of: 1) style discrepancy between original and condensed data, and 2) limited intra-class diversity of condensed dataset. We present a simple yet effective method to match the style information between original and condensed data, employing statistical moments of feature maps as well-established style indicators. Moreover, we enhance the intra-class diversity by maximizing the Kullback-Leibler divergence within each synthetic class, i.e., content. We demonstrate the efficacy of our method through experiments on diverse datasets of varying size and resolution, achieving improvements of up to 4.1% on CIFAR10, 4.2% on CIFAR100, 4.3% on TinyImageNet, 2.0% on ImageNet-1K, 3.3% on ImageWoof, 2.5% on ImageNette, and 5.5% in continual learning accuracy.
