Table of Contents
Fetching ...

AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion

Jotaro Sakamiya, I-Chao Shen, Jinsong Zhang, Mustafa Doga Dogan, Takeo Igarashi

TL;DR

AvatarPerfect addresses artifacts in 3D Gaussian Splatting (3DGS) avatars trained from monocular video by enabling user-guided refinement through 2D image editing and automatic pose suggestions. The system renders artifact-laden views with proposed body and camera poses, lets users edit the 2D renderings with background, inpaint, and diffusion-inpaint tools, and then retrains the 3DGS avatar using both the original video and user edits. Key contributions include a next-best-view inspired pose-suggestion mechanism based on Gaussian visibility, a practical 2D editing workflow to repair floating and anomalous color Gaussians, and a retraining scheme that integrates user edits with automatic updates. User studies and crowdsourced evaluations demonstrate that AvatarPerfect yields higher-quality 3DGS avatars under novel poses compared to a baseline editor, highlighting the value of combining intuitive 2D editing with automated pose guidance for 3D avatar refinement in VR/telepresence applications.

Abstract

Creating high-quality 3D avatars using 3D Gaussian Splatting (3DGS) from a monocular video benefits virtual reality and telecommunication applications. However, existing automatic methods exhibit artifacts under novel poses due to limited information in the input video. We propose AvatarPerfect, a novel system that allows users to iteratively refine 3DGS avatars by manually editing the rendered avatar images. In each iteration, our system suggests a new body and camera pose to help users identify and correct artifacts. The edited images are then used to update the current avatar, and our system suggests the next body and camera pose for further refinement. To investigate the effectiveness of AvatarPerfect, we conducted a user study comparing our method to an existing 3DGS editor SuperSplat, which allows direct manipulation of Gaussians without automatic pose suggestions. The results indicate that our system enables users to obtain higher quality refined 3DGS avatars than the existing 3DGS editor.

AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion

TL;DR

AvatarPerfect addresses artifacts in 3D Gaussian Splatting (3DGS) avatars trained from monocular video by enabling user-guided refinement through 2D image editing and automatic pose suggestions. The system renders artifact-laden views with proposed body and camera poses, lets users edit the 2D renderings with background, inpaint, and diffusion-inpaint tools, and then retrains the 3DGS avatar using both the original video and user edits. Key contributions include a next-best-view inspired pose-suggestion mechanism based on Gaussian visibility, a practical 2D editing workflow to repair floating and anomalous color Gaussians, and a retraining scheme that integrates user edits with automatic updates. User studies and crowdsourced evaluations demonstrate that AvatarPerfect yields higher-quality 3DGS avatars under novel poses compared to a baseline editor, highlighting the value of combining intuitive 2D editing with automated pose guidance for 3D avatar refinement in VR/telepresence applications.

Abstract

Creating high-quality 3D avatars using 3D Gaussian Splatting (3DGS) from a monocular video benefits virtual reality and telecommunication applications. However, existing automatic methods exhibit artifacts under novel poses due to limited information in the input video. We propose AvatarPerfect, a novel system that allows users to iteratively refine 3DGS avatars by manually editing the rendered avatar images. In each iteration, our system suggests a new body and camera pose to help users identify and correct artifacts. The edited images are then used to update the current avatar, and our system suggests the next body and camera pose for further refinement. To investigate the effectiveness of AvatarPerfect, we conducted a user study comparing our method to an existing 3DGS editor SuperSplat, which allows direct manipulation of Gaussians without automatic pose suggestions. The results indicate that our system enables users to obtain higher quality refined 3DGS avatars than the existing 3DGS editor.

Paper Structure

This paper contains 38 sections, 9 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Common artifacts in 3DGS avatars trained from a monocular input video. (a) Floating Gaussians: These arise in the occluded parts of the input video, where accurate estimation of Gaussian skinning weights is not feasible. (b) Anomalous Color Gaussians: These predominantly occur in the occluded parts of the input video, leading to unnatural Gaussian colors.
  • Figure 2: Concept of 3DGS Rendering. In 3D Gaussian Splatting, we represent 3D scenes and objects as a collection of Gaussians. When rendering a 2D image $I$, we determine the color of each pixel $p$ with a set of ordered Gaussians $\mathcal{N}_p$ that intersect with the ray corresponding to pixel $p$. First, we calculate the contribution of each Gaussian $g_i$ to pixel $p$ as the product of $T_i$, the transmittance of the Gaussians located in front of $g_i$, and $\alpha_i^p$, the $\alpha$-blending value derived from the value of Gaussian $g_i$ at the point where it intersects the ray. We then determine the color of pixel $p$ by applying $\alpha$-blending based on the contribution of each Gaussian $g_i$.
  • Figure 3: General 3DGS avatar training pipeline. In training a 3DGS avatar, the process iteratively repeats the calculation of image loss $L_\text{img}$ between the rendered image and the sampled image, followed by backpropagation of the gradients, similar to the original 3DGS. First, we sample a frame $I_t$ from the input video $\mathcal{I}$. At this point, each frame holds the body pose parameter $\theta_t$ of the subject and the camera pose parameter $\tau_t$. Using these parameters $\theta_t$ and $\tau_t$, we render the 3DGS avatar $\mathcal{G}$ into a 2D image by the rendering function $\mathcal{R}$. We then calculate the image loss $L_\text{img}$ between the rendered 2D image $\mathcal{R}(\mathcal{G}, \theta_t, \tau_t)$ and the sampled image $I_t$, and backpropagate the gradients to optimize the 3DGS avatar itself. In addition, we prune and densify the Gaussians in the 3DGS avatar according to the gradient information, as in the original 3DGS.
  • Figure 4: User interface of AvatarPerfect. In (a), users view the image rendered with the suggested body and camera pose. Our system provides three tools to edit the suggested image: (b) background tool, (c) inpaint tool, and (d) diffusion-inpaint Tool. The background tool enables users to erase the floating Gaussians by painting selected regions white, which our system internally treats as background. With the inpaint tool, users can paint the selected regions with a chosen color to address the anomalous color Gaussians. The diffusion-inpaint Tool serves functionality similar to the inpaint tool but utilizes a diffusion model to paint the selected region. After their refinements on the suggested image, they can click (e) "Update Avatar" button to prompt our system to update the avatar with the edited image. After the 3DGS avatar update, our system suggests another avatar image for further refinement.
  • Figure 5: Usage of the 2D image editing tools in AvatarPerfect. (a) The background tool allows the user to enclose areas of unwanted Gaussians in the suggested image $J_k \in \mathcal{J}$ and fill them with the background color. (b) With the inpaint tool, the user surrounds the area where they want to change the color on the suggested image $J_k \in \mathcal{J}$ and fills the selected area with a chosen color. (c) With the diffusion-inpaint tool, as with the inpaint tool, the user surrounds the area where they want to change the color on the suggested image $J_k \in \mathcal{J}$ and fills the selected area with the diffusion model.
  • ...and 6 more figures