StyleAlign: Analysis and Applications of Aligned StyleGAN Models
Zongze Wu, Yotam Nitzan, Eli Shechtman, Dani Lischinski
TL;DR
The paper analyzes aligned StyleGAN2 models obtained by fine-tuning a parent network to a new domain, revealing that latent spaces $\mathcal{W}$ and $\mathcal{S}$ retain rich semantics and that most fine-tuning changes occur in feature convolution layers. It demonstrates robust semantic alignment across related and distant domains, with some directions appearing forgotten but recoverable when retraining toward the parent. Building on these insights, the authors showcase simple yet effective applications: cross-domain image translation, automatic morphing, and zero-shot transfer tasks, often achieving state-of-the-art results with minimal task-specific engineering. The work provides a thorough empirical study of alignment, introduces practical inversion and interpolation techniques, and releases resources to facilitate reproduction and further exploration of aligned generative models.
Abstract
In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.
