Table of Contents
Fetching ...

GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

Xiangyue Liu, Kunming Luo, Heng Li, Qi Zhang, Yuan Liu, Li Yi, Ping Tan

TL;DR

GaussianAvatar-Editor addresses the problem of text-driven editing of animatable 4D Gaussian head avatars, tackling motion occlusion and 4D spatial-temporal inconsistency. It introduces the Weighted Alpha Blending Equation (WABE) to preferentially update visible Gaussians during editing, with $ oldsymbol{C(x)} = \sum_{k=1}^{K} w_k \boldsymbol{c}_k \alpha_k \prod_{j=1}^{k-1} (1-\alpha_j)$ and $ w_k = e^{-eta (1-\prod_{j=1}^{k-1} (1-\alpha_j))}$, where $ eta=6$ in experiments, effectively suppressing updates to occluded parts. To ensure 4D consistency, the method extends a render-edit-aggregate pipeline with a reconstruction loss $ ext{L}_{recon} = ext{L}_{L1} + ext{SSIM}$ and a temporal adversarial loss that enforces coherent evolution across time steps. Empirical results on the NeRSemble dataset show superior novel-view rendering, self-reenactment, and cross-identity reenactment compared with relevant baselines, establishing the approach as a strong solution for photorealistic animatable Gaussian avatar editing with text guidance.

Abstract

We introduce GaussianAvatar-Editor, an innovative framework for text-driven editing of animatable Gaussian head avatars that can be fully controlled in expression, pose, and viewpoint. Unlike static 3D Gaussian editing, editing animatable 4D Gaussian avatars presents challenges related to motion occlusion and spatial-temporal inconsistency. To address these issues, we propose the Weighted Alpha Blending Equation (WABE). This function enhances the blending weight of visible Gaussians while suppressing the influence on non-visible Gaussians, effectively handling motion occlusion during editing. Furthermore, to improve editing quality and ensure 4D consistency, we incorporate conditional adversarial learning into the editing process. This strategy helps to refine the edited results and maintain consistency throughout the animation. By integrating these methods, our GaussianAvatar-Editor achieves photorealistic and consistent results in animatable 4D Gaussian editing. We conduct comprehensive experiments across various subjects to validate the effectiveness of our proposed techniques, which demonstrates the superiority of our approach over existing methods. More results and code are available at: [Project Link](https://xiangyueliu.github.io/GaussianAvatar-Editor/).

GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

TL;DR

GaussianAvatar-Editor addresses the problem of text-driven editing of animatable 4D Gaussian head avatars, tackling motion occlusion and 4D spatial-temporal inconsistency. It introduces the Weighted Alpha Blending Equation (WABE) to preferentially update visible Gaussians during editing, with and , where in experiments, effectively suppressing updates to occluded parts. To ensure 4D consistency, the method extends a render-edit-aggregate pipeline with a reconstruction loss and a temporal adversarial loss that enforces coherent evolution across time steps. Empirical results on the NeRSemble dataset show superior novel-view rendering, self-reenactment, and cross-identity reenactment compared with relevant baselines, establishing the approach as a strong solution for photorealistic animatable Gaussian avatar editing with text guidance.

Abstract

We introduce GaussianAvatar-Editor, an innovative framework for text-driven editing of animatable Gaussian head avatars that can be fully controlled in expression, pose, and viewpoint. Unlike static 3D Gaussian editing, editing animatable 4D Gaussian avatars presents challenges related to motion occlusion and spatial-temporal inconsistency. To address these issues, we propose the Weighted Alpha Blending Equation (WABE). This function enhances the blending weight of visible Gaussians while suppressing the influence on non-visible Gaussians, effectively handling motion occlusion during editing. Furthermore, to improve editing quality and ensure 4D consistency, we incorporate conditional adversarial learning into the editing process. This strategy helps to refine the edited results and maintain consistency throughout the animation. By integrating these methods, our GaussianAvatar-Editor achieves photorealistic and consistent results in animatable 4D Gaussian editing. We conduct comprehensive experiments across various subjects to validate the effectiveness of our proposed techniques, which demonstrates the superiority of our approach over existing methods. More results and code are available at: [Project Link](https://xiangyueliu.github.io/GaussianAvatar-Editor/).
Paper Structure (16 sections, 8 equations, 10 figures, 1 table)

This paper contains 16 sections, 8 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: We introduce GaussianAvatar-Editor, a method for text-driven editing of animatable Gaussian head avatars with fully controllable expression, pose, and viewpoint. We show qualitative results of our GaussianAvatar-Editor at the inference time above. Our edited avatars can achieve photorealistic editing results with strong spatial and temporal consistency.
  • Figure 2: The overview of our method. We follow a render-edit-aggregate optimization pipeline as in Instruct-NeRF2NeRF haque2023instruct. We introduce a Weighted Alpha Blending Equation (WABE) to overcome the motion occlusion problem and our novel loss functions to enhance the spatial-temporal consistency. Our edited avatars can generate high-quality and consistent 4D renderings and can be controlled by other actors.
  • Figure 3: Illustration of the Weighted alpha blending equation (WABE), which is adjusted to suppress non-visible parts while enhancing visible parts. Lower left: results when WABE is disabled. Lower right: when WABE is enabled, motion-occluded regions like teeth and tongue can be successfully optimized.
  • Figure 4: Our results on novel view synthesis. We show our edited results using the text prompt “Turn her into the Tolkien Elf”.
  • Figure 5: Comparison on novel view synthesis. Our method produces more high-quality and multi-view consistent results than baselines.
  • ...and 5 more figures