Table of Contents
Fetching ...

DAM: Domain-Aware Module for Multi-Domain Dataset Condensation

Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Sunghyun Baek, Junmo Kim

TL;DR

This work tackles the challenge of condensing multi-domain data by introducing Multi-Domain Dataset Condensation (MDDC) and the Domain-Aware Module (DAM), which embeds domain cues into synthetic samples via learnable spatial masks. DAM is trained with a frequency-based pseudo-domain labeling scheme that does not require explicit domain annotations, and the overall objective combines class discrimination with domain-aware supervision: $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{cls}} + \lambda \mathcal{L}_{\text{dom}}$. Empirically, MDDC with DAM improves in-domain and cross-domain performance across datasets (CIFAR-10/100, Tiny ImageNet, PACS, VLCS, Office-Home) and architectures (ConvNet, AlexNet, VGG, ResNet-18, ViT), while preserving IPC and maintaining condensation efficiency. The approach also demonstrates cross-architecture generalization and scalability to larger multi-domain data (DomainNet) with modest overhead. Overall, DAM provides a robust, plug-and-play path to robust condensed data in heterogeneous environments, enabling efficient model training without explicit domain labels.

Abstract

Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.

DAM: Domain-Aware Module for Multi-Domain Dataset Condensation

TL;DR

This work tackles the challenge of condensing multi-domain data by introducing Multi-Domain Dataset Condensation (MDDC) and the Domain-Aware Module (DAM), which embeds domain cues into synthetic samples via learnable spatial masks. DAM is trained with a frequency-based pseudo-domain labeling scheme that does not require explicit domain annotations, and the overall objective combines class discrimination with domain-aware supervision: . Empirically, MDDC with DAM improves in-domain and cross-domain performance across datasets (CIFAR-10/100, Tiny ImageNet, PACS, VLCS, Office-Home) and architectures (ConvNet, AlexNet, VGG, ResNet-18, ViT), while preserving IPC and maintaining condensation efficiency. The approach also demonstrates cross-architecture generalization and scalability to larger multi-domain data (DomainNet) with modest overhead. Overall, DAM provides a robust, plug-and-play path to robust condensed data in heterogeneous environments, enabling efficient model training without explicit domain labels.

Abstract

Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.

Paper Structure

This paper contains 34 sections, 9 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Performance of single- and multi-domain training for existing dataset condensation methods (DC, DM, MTT) on the PACS dataset under a 10 images per class setting. In the single-domain setup, models are trained using only Cartoon domain images, assuming access to explicit domain labels. In contrast, the multi-domain dataset setting trains on the full PACS dataset without domain supervision, reflecting modern datasets. In all prior methods, the performance drop in the multi-domain setting was significant.
  • Figure 2: DAM incorporates both class-aware training (left) from prior methods and domain-aware training (right), the proposed DAM.
  • Figure 3: Visualization of the final output in CIFAR-10 and PACS under 10 IPC setting. The shown images are condensed with DC+DAM. More outputs can be found in the supplementary material.
  • Figure 4: Experiment with a varying number of domains $D$ on CIFAR-10 and PACS dataset under 1 and 10 IPC with DM and DM+DAM.
  • Figure 5: Visualization of the final output and domain masks in CIFAR-10 under 10 IPC setting. The shown images are condensed with DC+DAM.
  • ...and 3 more figures