Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance
Lisha Li, Jingwen Hou, Weide Liu, Yuming Fang, Jiebin Yan
TL;DR
This paper tackles the challenge of enhancing facial aesthetics while preserving identity in 2D images. It introduces NNSG-Diffusion, which uses 3D structure guidance derived from a nearest aesthetic prototype via a 3D Morphable Model, extracting depth and contour cues to steer diffusion with ControlNet. The method comprises three modules—Nearest Neighbor Face Searching (NNFS), Facial Guidance Extraction (FGE), and Face Beautification (FB)—that together produce a 3D-guided enhancement framework: $G_{3D} = \{G_{depth}, G_{contour}\}$ guides the diffusion process, and the identity-preserving fusion is implemented through $\boldsymbol{\alpha} = \lambda \boldsymbol{\alpha}_0 + \mu \boldsymbol{\alpha}_r$ during guidance. Experimental results on FFHQ, CelebAMask-Hq, and SCUT-FBP show higher attractiveness while maintaining identity comparable to or better than contemporary GAN- and diffusion-based methods; subjective surveys corroborate the perceptual gains. The work further demonstrates that combining 3D structure guidance with implicit ID-preservation methods (e.g., IP-Adapter) yields improved identity preservation without compromising aesthetic quality, highlighting practical applicability and potential for broader deployment.
Abstract
Facial Aesthetics Enhancement (FAE) aims to improve facial attractiveness by adjusting the structure and appearance of a facial image while preserving its identity as much as possible. Most existing methods adopted deep feature-based or score-based guidance for generation models to conduct FAE. Although these methods achieved promising results, they potentially produced excessively beautified results with lower identity consistency or insufficiently improved facial attractiveness. To enhance facial aesthetics with less loss of identity, we propose the Nearest Neighbor Structure Guidance based on Diffusion (NNSG-Diffusion), a diffusion-based FAE method that beautifies a 2D facial image with 3D structure guidance. Specifically, we propose to extract FAE guidance from a nearest neighbor reference face. To allow for less change of facial structures in the FAE process, a 3D face model is recovered by referring to both the matched 2D reference face and the 2D input face, so that the depth and contour guidance can be extracted from the 3D face model. Then the depth and contour clues can provide effective guidance to Stable Diffusion with ControlNet for FAE. Extensive experiments demonstrate that our method is superior to previous relevant methods in enhancing facial aesthetics while preserving facial identity.
