Fashion Style Editing with Generative Human Prior
Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang
TL;DR
This work tackles fashion style editing on full-body human images using text prompts. It introduces FaSE, a framework built on a StyleGAN-Human prior that uses three latent-mapper branches ($M^c$, $M^m$, $M^f$) in the $W^+$ space, guided by CLIP loss and regularization. To overcome the limitations of naive CLIP guidance, it employs textual augmentation via an LLM and visual reference retrieval from a curated fashion image database, incorporating $ ext{L}_{CLIP}$ and $ ext{L}_{Ref}$ losses, with references inverted into $W^+$ to provide vivid, illustrative signals. The approach also analyzes the hierarchical latent space to assign fashion edits to the appropriate levels (pose, garment shape, texture), and experiments show improvements over StyleCLIP in both qualitative and quantitative assessments, supported by human and AI-driven evaluations. Overall, FaSE enables robust, flexible, and interpretable fashion style edits with practical implications for fashion imaging and digital garment design.
Abstract
Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashion style of human imagery using text descriptions. Specifically, we leverage a generative human prior and achieve fashion style editing by navigating its learned latent space. We first verify that the existing text-driven editing methods fall short for our problem due to their overly simplified guidance signal, and propose two directions to reinforce the guidance: textual augmentation and visual referencing. Combined with our empirical findings on the latent space structure, our Fashion Style Editing framework (FaSE) successfully projects abstract fashion concepts onto human images and introduces exciting new applications to the field.
