One-for-All: Towards Universal Domain Translation with a Single StyleGAN
Yong Du, Jiahui Zhan, Xinzhe Li, Junyu Dong, Sheng Chen, Ming-Hsuan Yang, Shengfeng He
TL;DR
This work tackles universal domain translation across visually distinct domains with limited data. It introduces UniTranslator, a hybrid framework that uses CLIP as a domain-neutral bridge and a CLIP2P mapper to align CLIP embeddings with StyleGAN's latent space, enabling high-quality translations between far-apart domains. The key innovations are a decoupling module that extracts domain-agnostic semantics, and a nonlinear CLIP2P mapper that bridges CLIP to StyleGAN’s $P$ space, guided by a suite of losses to preserve cross-domain correspondences and visual fidelity. Extensive experiments show UniTranslator outperforms state-of-the-art learning-based and diffusion-based methods in image quality, domain relevance, and diversity, while remaining robust to degradation and suitable for applications such as style mixing and stylization. This approach offers a practical path toward universal, single-source-to-target-domain translation across diverse visual domains, with public release planned for code and models.
Abstract
In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the StyleGAN's latent space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.
