Table of Contents
Fetching ...

Multi-Directional Subspace Editing in Style-Space

Chen Naveh, Yacov Hel-Or

TL;DR

MDSE addresses the challenge of disentangled, multi-attribute editing in StyleGAN by decomposing the latent space $\mathcal{W^+}$ into orthogonal subspaces, each tied to a specific attribute. It introduces an orthogonality loss and a mixing loss to support multi-directional edits within each subspace while preserving other attributes, and it demonstrates superior disentanglement and identity preservation compared with leading baselines. Quantitative metrics and ablation studies confirm the importance of explicit subspace orthogonality for reducing entanglement and artifacts during sequential edits. The work has practical impact for controllable, high-fidelity face editing and suggests directions for integrating such structure into generator training for even stronger disentanglement.

Abstract

This paper describes a new technique for finding disentangled semantic directions in the latent space of StyleGAN. Our method identifies meaningful orthogonal subspaces that allow editing of one human face attribute, while minimizing undesired changes in other attributes. Our model is capable of editing a single attribute in multiple directions, resulting in a range of possible generated images. We compare our scheme with three state-of-the-art models and show that our method outperforms them in terms of face editing and disentanglement capabilities. Additionally, we suggest quantitative measures for evaluating attribute separation and disentanglement, and exhibit the superiority of our model with respect to those measures.

Multi-Directional Subspace Editing in Style-Space

TL;DR

MDSE addresses the challenge of disentangled, multi-attribute editing in StyleGAN by decomposing the latent space into orthogonal subspaces, each tied to a specific attribute. It introduces an orthogonality loss and a mixing loss to support multi-directional edits within each subspace while preserving other attributes, and it demonstrates superior disentanglement and identity preservation compared with leading baselines. Quantitative metrics and ablation studies confirm the importance of explicit subspace orthogonality for reducing entanglement and artifacts during sequential edits. The work has practical impact for controllable, high-fidelity face editing and suggests directions for integrating such structure into generator training for even stronger disentanglement.

Abstract

This paper describes a new technique for finding disentangled semantic directions in the latent space of StyleGAN. Our method identifies meaningful orthogonal subspaces that allow editing of one human face attribute, while minimizing undesired changes in other attributes. Our model is capable of editing a single attribute in multiple directions, resulting in a range of possible generated images. We compare our scheme with three state-of-the-art models and show that our method outperforms them in terms of face editing and disentanglement capabilities. Additionally, we suggest quantitative measures for evaluating attribute separation and disentanglement, and exhibit the superiority of our model with respect to those measures.
Paper Structure (18 sections, 15 equations, 12 figures, 5 tables)

This paper contains 18 sections, 15 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Multi-directional human face editing. Each row indicates changes in a subspace associated with a particular attribute: (top to bottom) gender, age and race.
  • Figure 2: Image editing capabilities using source images and target attribute images. The attributes derived from the target images include pose, smile, and race.
  • Figure 3: Comparison of real image editing between our method and the baseline approaches. The edit direction for each attribute was determined using a linear SVM classifier.
  • Figure 4: Sequential human face editing result.
  • Figure 5: Comparative analysis of the inter-attribute effects exhibited by different models. The upper rows display how changes in one attribute vary as a function of changes in another attribute. The bottom row provides an average of all the rows above.
  • ...and 7 more figures