UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation
Hengjia Li, Yang Liu, Yuqi Lin, Zhanwei Zhang, Yibo Zhao, weihang Pan, Tu Zheng, Zheng Yang, Yuchun Jiang, Boxi Wu, Deng Cai
TL;DR
UniHDA tackles multifaceted generative domain adaptation by enabling a pre-trained generator to synthesize hybrid domains that combine attributes from multiple text and image references. It maps all references into a unified CLIP embedding space and forms the hybrid-domain direction through linear interpolation of target-domain directions, guided by a multi-modal direction loss. A cross-domain spatial structure loss based on Dino-ViT preserves fine-grained spatial information to maintain consistency with the source domain. The framework is generator-agnostic, validated on 2D and 3D generators as well as diffusion models, and demonstrates strong cross-domain consistency and attribute inheritance across image-image, text-text, and image-text tasks with substantial efficiency gains.
Abstract
Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of the diversity. In this paper, we propose UniHDA, a \textbf{unified} and \textbf{versatile} framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linearly interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure \textbf{consistency} with the source domain, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is generator-agnostic and versatile to multiple generators, e.g., StyleGAN, EG3D, and Diffusion Models.
