Table of Contents
Fetching ...

A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing

Yu-Kai Huang, Yutong Zheng, Yen-Shuo Su, Anudeepsekhar Bolimera, Han Zhang, Fangyi Chen, Marios Savvides

TL;DR

This work introduces a novel framework that merges the strengths of latent-based and reference-based editing methods to embed attributes from the reference image into a tri-plane space, ensuring 3D consistency and realistic viewing from multiple perspectives.

Abstract

Facial attribute editing plays a crucial role in synthesizing realistic faces with specific characteristics while maintaining realistic appearances. Despite advancements, challenges persist in achieving precise, 3D-aware attribute modifications, which are crucial for consistent and accurate representations of faces from different angles. Current methods struggle with semantic entanglement and lack effective guidance for incorporating attributes while maintaining image integrity. To address these issues, we introduce a novel framework that merges the strengths of latent-based and reference-based editing methods. Our approach employs a 3D GAN inversion technique to embed attributes from the reference image into a tri-plane space, ensuring 3D consistency and realistic viewing from multiple perspectives. We utilize blending techniques and predicted semantic masks to locate precise edit regions, merging them with the contextual guidance from the reference image. A coarse-to-fine inpainting strategy is then applied to preserve the integrity of untargeted areas, significantly enhancing realism. Our evaluations demonstrate superior performance across diverse editing tasks, validating our framework's effectiveness in realistic and applicable facial attribute editing.

A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing

TL;DR

This work introduces a novel framework that merges the strengths of latent-based and reference-based editing methods to embed attributes from the reference image into a tri-plane space, ensuring 3D consistency and realistic viewing from multiple perspectives.

Abstract

Facial attribute editing plays a crucial role in synthesizing realistic faces with specific characteristics while maintaining realistic appearances. Despite advancements, challenges persist in achieving precise, 3D-aware attribute modifications, which are crucial for consistent and accurate representations of faces from different angles. Current methods struggle with semantic entanglement and lack effective guidance for incorporating attributes while maintaining image integrity. To address these issues, we introduce a novel framework that merges the strengths of latent-based and reference-based editing methods. Our approach employs a 3D GAN inversion technique to embed attributes from the reference image into a tri-plane space, ensuring 3D consistency and realistic viewing from multiple perspectives. We utilize blending techniques and predicted semantic masks to locate precise edit regions, merging them with the contextual guidance from the reference image. A coarse-to-fine inpainting strategy is then applied to preserve the integrity of untargeted areas, significantly enhancing realism. Our evaluations demonstrate superior performance across diverse editing tasks, validating our framework's effectiveness in realistic and applicable facial attribute editing.
Paper Structure (38 sections, 29 equations, 10 figures, 2 tables)

This paper contains 38 sections, 29 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Through our editing framework, we demonstrate seven types of local facial attribute edits, featuring novel-view RGB and semantic renderings. Our method consistently aligns semantic regions from the reference images with the identity image, ensuring accurate edits despite pose variations.
  • Figure 2: In the initial stage, we embed both the identity and reference images into the canonical tri-plane space, delineate the target editing area using predicted semantic masks, and incorporate features into the identity through alpha blending.
  • Figure 3: After extracting semantic masks from the blended tri-plane features, we employ a coarse-to-fine inpainting technique to incorporate the reference context into the identity image, ensuring that unmodified areas are preserved to generate the final output.
  • Figure 4: Utilizing the Image-to-plane module, we convert the composite image into a tri-plane representation and subsequently perform volume rendering to generate novel-view RGB and semantic images.
  • Figure 5: Qualitative comparison with other competing editing methods. The top three rows display the addition of eyeglasses, while the remaining rows illustrate changes to hair.
  • ...and 5 more figures