StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN
Andrzej Bedychaj, Jacek Tabor, Marek Śmieja
TL;DR
StyleAE introduces a lightweight AutoEncoder plugin that attaches to a pre-trained StyleGAN to enable targeted attribute manipulation directly in the StyleGAN latent space $W$, by learning a structured mapping to a latent $(C,S)$ where $C=(C_1,...,C_K)$ encodes labeled attributes and $S$ captures remaining information. The encoder $\mathcal{E}$ extracts attributes from $w$, while the decoder $\mathcal{D}$ reconstructs $w$ to preserve image quality, trained with a dual loss that penalizes reconstruction error and deviations from target attributes, including a specialized $d_A^S$ term for binary attributes. Compared to flow-based approaches like StyleFlow and PluGeN, StyleAE is simpler and significantly more computationally efficient, yet achieves comparable attribute manipulation accuracy and superior preservation of other image features on FFHQ and AFHQv2 datasets. The results demonstrate that a straightforward AutoEncoder framework can yield effective, controllable attribute edits with faster training and inference, broadening practical deployment of attribute-conditioned image editing across diverse domains. Future work includes improving latent-space disentanglement and extending StyleAE to other generative backbones beyond StyleGAN.
Abstract
Deep conditional generative models are excellent tools for creating high-quality images and editing their attributes. However, training modern generative models from scratch is very expensive and requires large computational resources. In this paper, we introduce StyleAutoEncoder (StyleAE), a lightweight AutoEncoder module, which works as a plugin for pre-trained generative models and allows for manipulating the requested attributes of images. The proposed method offers a cost-effective solution for training deep generative models with limited computational resources, making it a promising technique for a wide range of applications. We evaluate StyleAutoEncoder by combining it with StyleGAN, which is currently one of the top generative models. Our experiments demonstrate that StyleAutoEncoder is at least as effective in manipulating image attributes as the state-of-the-art algorithms based on invertible normalizing flows. However, it is simpler, faster, and gives more freedom in designing neural
