DAM: Domain-Aware Module for Multi-Domain Dataset Condensation
Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Sunghyun Baek, Junmo Kim
TL;DR
This work tackles the challenge of condensing multi-domain data by introducing Multi-Domain Dataset Condensation (MDDC) and the Domain-Aware Module (DAM), which embeds domain cues into synthetic samples via learnable spatial masks. DAM is trained with a frequency-based pseudo-domain labeling scheme that does not require explicit domain annotations, and the overall objective combines class discrimination with domain-aware supervision: $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{cls}} + \lambda \mathcal{L}_{\text{dom}}$. Empirically, MDDC with DAM improves in-domain and cross-domain performance across datasets (CIFAR-10/100, Tiny ImageNet, PACS, VLCS, Office-Home) and architectures (ConvNet, AlexNet, VGG, ResNet-18, ViT), while preserving IPC and maintaining condensation efficiency. The approach also demonstrates cross-architecture generalization and scalability to larger multi-domain data (DomainNet) with modest overhead. Overall, DAM provides a robust, plug-and-play path to robust condensed data in heterogeneous environments, enabling efficient model training without explicit domain labels.
Abstract
Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.
