StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human
Takato Yoshikawa, Yuki Endo, Yoshihiro Kanamori
TL;DR
This work introduces StyleHumanCLIP, a text-guided garment editing framework for StyleGAN-Human that preserves identity while editing full-body garments. The core idea is an attention-based latent code mapper that uses cross-attention between latent codes and CLIP text embeddings to generate a latent residual $\ abla w$, which is added to the input latent code $w$ in the $W^+$ space to obtain $w'$. To constrain edits to garment regions, the method employs feature-space masking, computing masks from a human parsing model and blending feature maps via the mask $M = P_t(G(w)) \cup P_t(G(w'))$. The approach is trained with CLIP-based losses, including a directional CLIP component, and evaluated against StyleCLIP, HairCLIP+, and diffusion-based methods, showing improved text fidelity and identity preservation, including applicability to real images via GAN inversion. Overall, StyleHumanCLIP advances text-based editing for full-body humans by integrating an attention-enabled latent mapper with inference-time masking, offering practical garment manipulation with preserved subject identity.
Abstract
This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods.
