Table of Contents
Fetching ...

Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance

Lisha Li, Jingwen Hou, Weide Liu, Yuming Fang, Jiebin Yan

TL;DR

This paper tackles the challenge of enhancing facial aesthetics while preserving identity in 2D images. It introduces NNSG-Diffusion, which uses 3D structure guidance derived from a nearest aesthetic prototype via a 3D Morphable Model, extracting depth and contour cues to steer diffusion with ControlNet. The method comprises three modules—Nearest Neighbor Face Searching (NNFS), Facial Guidance Extraction (FGE), and Face Beautification (FB)—that together produce a 3D-guided enhancement framework: $G_{3D} = \{G_{depth}, G_{contour}\}$ guides the diffusion process, and the identity-preserving fusion is implemented through $\boldsymbol{\alpha} = \lambda \boldsymbol{\alpha}_0 + \mu \boldsymbol{\alpha}_r$ during guidance. Experimental results on FFHQ, CelebAMask-Hq, and SCUT-FBP show higher attractiveness while maintaining identity comparable to or better than contemporary GAN- and diffusion-based methods; subjective surveys corroborate the perceptual gains. The work further demonstrates that combining 3D structure guidance with implicit ID-preservation methods (e.g., IP-Adapter) yields improved identity preservation without compromising aesthetic quality, highlighting practical applicability and potential for broader deployment.

Abstract

Facial Aesthetics Enhancement (FAE) aims to improve facial attractiveness by adjusting the structure and appearance of a facial image while preserving its identity as much as possible. Most existing methods adopted deep feature-based or score-based guidance for generation models to conduct FAE. Although these methods achieved promising results, they potentially produced excessively beautified results with lower identity consistency or insufficiently improved facial attractiveness. To enhance facial aesthetics with less loss of identity, we propose the Nearest Neighbor Structure Guidance based on Diffusion (NNSG-Diffusion), a diffusion-based FAE method that beautifies a 2D facial image with 3D structure guidance. Specifically, we propose to extract FAE guidance from a nearest neighbor reference face. To allow for less change of facial structures in the FAE process, a 3D face model is recovered by referring to both the matched 2D reference face and the 2D input face, so that the depth and contour guidance can be extracted from the 3D face model. Then the depth and contour clues can provide effective guidance to Stable Diffusion with ControlNet for FAE. Extensive experiments demonstrate that our method is superior to previous relevant methods in enhancing facial aesthetics while preserving facial identity.

Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance

TL;DR

This paper tackles the challenge of enhancing facial aesthetics while preserving identity in 2D images. It introduces NNSG-Diffusion, which uses 3D structure guidance derived from a nearest aesthetic prototype via a 3D Morphable Model, extracting depth and contour cues to steer diffusion with ControlNet. The method comprises three modules—Nearest Neighbor Face Searching (NNFS), Facial Guidance Extraction (FGE), and Face Beautification (FB)—that together produce a 3D-guided enhancement framework: guides the diffusion process, and the identity-preserving fusion is implemented through during guidance. Experimental results on FFHQ, CelebAMask-Hq, and SCUT-FBP show higher attractiveness while maintaining identity comparable to or better than contemporary GAN- and diffusion-based methods; subjective surveys corroborate the perceptual gains. The work further demonstrates that combining 3D structure guidance with implicit ID-preservation methods (e.g., IP-Adapter) yields improved identity preservation without compromising aesthetic quality, highlighting practical applicability and potential for broader deployment.

Abstract

Facial Aesthetics Enhancement (FAE) aims to improve facial attractiveness by adjusting the structure and appearance of a facial image while preserving its identity as much as possible. Most existing methods adopted deep feature-based or score-based guidance for generation models to conduct FAE. Although these methods achieved promising results, they potentially produced excessively beautified results with lower identity consistency or insufficiently improved facial attractiveness. To enhance facial aesthetics with less loss of identity, we propose the Nearest Neighbor Structure Guidance based on Diffusion (NNSG-Diffusion), a diffusion-based FAE method that beautifies a 2D facial image with 3D structure guidance. Specifically, we propose to extract FAE guidance from a nearest neighbor reference face. To allow for less change of facial structures in the FAE process, a 3D face model is recovered by referring to both the matched 2D reference face and the 2D input face, so that the depth and contour guidance can be extracted from the 3D face model. Then the depth and contour clues can provide effective guidance to Stable Diffusion with ControlNet for FAE. Extensive experiments demonstrate that our method is superior to previous relevant methods in enhancing facial aesthetics while preserving facial identity.

Paper Structure

This paper contains 23 sections, 11 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Our method provides effective beautification guidance by: (a) Recovering 3DMM model blanz2023morphable from the 2D reference image and 2D reference image, and combining them to construct a 3D prototype; (b) Extracting contour guidance and depth guidance from the 3D prototype. Our method improves facial attractiveness with the highest ID similarity (compared to the input) when combining the contour and the depth guidance to Stable Diffusion rombach2022high with ControlNet zhang2023adding, as shown in (c).
  • Figure 2: Design of the proposed NNFS module. R-Net deng2019accurate is used to predict parameter vectors of decoupled 3D facial attributes: identity $\bm \alpha$, expression $\bm \beta$, texture $\bm{\delta}$, pose $\bm p$, and lighting $\bm \gamma$.
  • Figure 3: Design of the FGE module. FGE module projects the fused 3D face by the Pytorch3D Renderer for guidance extraction ravi2020pytorch3dlassner2020pulsar.
  • Figure 4: Demonstration of using depth and contour guidance in Stable Diffusion zhang2023adding with control weights $(\omega, \eta)$. $Y_{c}$ and $Y_{d}$ are outputs of the intermediate block and encoding block and are directly fed into Stable Diffusion.
  • Figure 5: The framework of proposed Nearest Neighbor Structure Guidance Based on Diffusion Model (NNSG-Diffusion). It consists of three modules: Nearest Neighbor Face Searching (NNFS), Facial Guidance Extraction (FGE), and Face Beautification (FB). Our method adopts the NNFS module to search reference face most similar to the input face, and fuses reference and input faces to extract 3D structure guidance by FGE module. Finally, the FB module can accurately control the process of beautification by using the 3D structure guidance.
  • ...and 11 more figures