U-Face: An Efficient and Generalizable Framework for Unsupervised Facial Attribute Editing via Subspace Learning

Bo Liu; Xuan Cui; Run Zeng; Wei Duan; Chongwen Liu; Jinrui Qian; Lianggui Tang; Hongping Gan

U-Face: An Efficient and Generalizable Framework for Unsupervised Facial Attribute Editing via Subspace Learning

Bo Liu, Xuan Cui, Run Zeng, Wei Duan, Chongwen Liu, Jinrui Qian, Lianggui Tang, Hongping Gan

Abstract

Latent space-based facial attribute editing methods have gained popularity in applications such as digital entertainment, virtual avatar creation, and human-computer interaction systems due to their potential for efficient and flexible attribute manipulation, particularly for continuous edits. Among these, unsupervised latent space-based methods, which discover effective semantic vectors without relying on labeled data, have attracted considerable attention in the research community. However, existing methods still encounter difficulties in disentanglement, as manipulating a specific facial attribute may unintentionally affect other attributes, complicating fine-grained controllability. To address these challenges, we propose a novel framework designed to offer an effective and adaptable solution for unsupervised facial attribute editing, called Unsupervised Facial Attribute Controllable Editing (U-Face). The proposed method frames semantic vector learning as a subspace learning problem, where latent vectors are approximated within a lower-dimensional semantic subspace spanned by a semantic vector matrix. This formulation can also be equivalently interpreted from a projection-reconstruction perspective and further generalized into an autoencoder framework, providing a foundation that can support disentangled representation learning in a flexible manner. To improve disentanglement and controllability, we impose orthogonal non-negative constraints on the semantic vectors and incorporate attribute boundary vectors to reduce entanglement in the learned directions. Although these constraints make the optimization problem challenging, we design an alternating iterative algorithm, called Alternating Iterative Disentanglement and Controllability (AIDC), with closed-form updates and provable convergence under specific conditions.

U-Face: An Efficient and Generalizable Framework for Unsupervised Facial Attribute Editing via Subspace Learning

Abstract

Paper Structure (41 sections, 4 theorems, 30 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 41 sections, 4 theorems, 30 equations, 10 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Supervised Latent Space-Based Facial Image Editing
Unsupervised Latent Space-Based Facial Image Editing
Diffusion-based Facial Image Editing
Proposed Framework
Latent Vector Subspace Learning
The Proposed Objective Function of U-Face
Orthogonal non-negative Constraints
Attribute Boundary Vector
Alternating Iterative Disentanglement and Controllability Algorithm
Updating $\mathbf W$ with $\mathbf P$ and $\mathbf F$ Fixed
Updating $\mathbf P$ with $\mathbf W$ and $\mathbf F$ Fixed
Updating $\mathbf F$ with $\mathbf W$ and $\mathbf P$ Fixed
Convergence and Time Complexity Analysis of AIDC
...and 26 more sections

Key Result

Theorem 1

Let $\{(\mathbf W^t,\mathbf P^t,\mathbf F^t)\}$ be the sequence generated by AIDC with $\alpha>0$ and $\lambda\ge 0$. Then the objective values are non-increasing: and the sequence $J(\mathbf W^{t},\mathbf P^{t},\mathbf F^{t})$ converges. Moreover, every accumulation point is a first-order stationary point in Eq. JWEq. Proof sketch. Exact minimization of each block yields $J(\mathbf W^{t+1},\math

Figures (10)

Figure 1: Unsupervised Facial Attribute Controllable Editing (U-Face) framework.
Figure 1: Comparison of identity consistency preservation across SeFa, GANSpace, EnjoyGAN, InterFaceGAN, and U-Face using the $IDS$ metric for the Age, Gender, Smile, and Hairstyle attributes. The editing magnitude $\beta$ varies from $0.1$ to $0.8$.
Figure 2: $\mathbf w_1, \mathbf w_2, \ldots, \mathbf w_5$ are the five semantic vectors. (a) shows the original facial image $\mathbf M_{ori} = G(\mathbf z)$, and (b)-(f) show edited images $\mathbf M_{edit}^i = G(\mathbf z + \mathbf w_i),i=1,2,\ldots,5$. Each semantic vector simultaneously manipulates multiple facial attributes: $\mathbf w_1$ adjusts both glasses and age, $\mathbf w_2$ manipulates glasses and hairstyle, and the others similarly influence various attributes.
Figure 3: Qualitative comparison of U-Face and GAN-based baseline methods on StyleGAN and StyleGAN2 across the Age, Gender, Hairstyle, and Smile attributes.
Figure 4: Qualitative evaluation of U-Face and diffusion-based baseline methods across on the Age, Gender, and Smile attributes.
...and 5 more figures

Theorems & Definitions (7)

Theorem 1: Monotone descent and first-order stationarity
Theorem 1: Monotone descent and first-order stationarity
proof
Lemma 1
proof
Theorem 2
proof

U-Face: An Efficient and Generalizable Framework for Unsupervised Facial Attribute Editing via Subspace Learning

Abstract

U-Face: An Efficient and Generalizable Framework for Unsupervised Facial Attribute Editing via Subspace Learning

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)