Table of Contents
Fetching ...

AvatarBrush: Monocular Reconstruction of Gaussian Avatars with Intuitive Local Editing

Mengtian Li, Shengxiang Yao, Yichen Pan, Haiyao Xiao, Zhongmei Li, Zhifeng Xie, Keyu Chen

TL;DR

AvatarBrush presents Gaussian Morphing Avatar (GMA), a three-layer representation that unifies SMPL-X geometry with a Gaussian-based appearance for editable, animatable avatars from monocular video. Through Avatar Morphing and a two-stage training strategy, it achieves accurate geometry and rich texture while enabling localized edits, garment transfer, and texture painting without retraining. Quantitative and qualitative results demonstrate superior reconstruction quality and editing capabilities across multiple benchmarks, with practical real-time rendering suitable for virtual production and digital humans applications. The work emphasizes efficient, controllable avatar creation by decoupling geometry, texture, and appearance, paving the way for dynamic, user-driven digital humans.

Abstract

The efficient reconstruction of high-quality and intuitively editable human avatars presents a pressing challenge in the field of computer vision. Recent advancements, such as 3DGS, have demonstrated impressive reconstruction efficiency and rapid rendering speeds. However, intuitive local editing of these representations remains a significant challenge. In this work, we propose AvatarBrush, a framework that reconstructs fully animatable and locally editable avatars using only a monocular video input. We propose a three-layer model to represent the avatar and, inspired by mesh morphing techniques, design a framework to generate the Gaussian model from local information of the parametric body model. Compared to previous methods that require scanned meshes or multi-view captures as input, our approach reduces costs and enhances editing capabilities such as body shape adjustment, local texture modification, and geometry transfer. Our experimental results demonstrate superior quality across two datasets and emphasize the enhanced, user-friendly, and localized editing capabilities of our method.

AvatarBrush: Monocular Reconstruction of Gaussian Avatars with Intuitive Local Editing

TL;DR

AvatarBrush presents Gaussian Morphing Avatar (GMA), a three-layer representation that unifies SMPL-X geometry with a Gaussian-based appearance for editable, animatable avatars from monocular video. Through Avatar Morphing and a two-stage training strategy, it achieves accurate geometry and rich texture while enabling localized edits, garment transfer, and texture painting without retraining. Quantitative and qualitative results demonstrate superior reconstruction quality and editing capabilities across multiple benchmarks, with practical real-time rendering suitable for virtual production and digital humans applications. The work emphasizes efficient, controllable avatar creation by decoupling geometry, texture, and appearance, paving the way for dynamic, user-driven digital humans.

Abstract

The efficient reconstruction of high-quality and intuitively editable human avatars presents a pressing challenge in the field of computer vision. Recent advancements, such as 3DGS, have demonstrated impressive reconstruction efficiency and rapid rendering speeds. However, intuitive local editing of these representations remains a significant challenge. In this work, we propose AvatarBrush, a framework that reconstructs fully animatable and locally editable avatars using only a monocular video input. We propose a three-layer model to represent the avatar and, inspired by mesh morphing techniques, design a framework to generate the Gaussian model from local information of the parametric body model. Compared to previous methods that require scanned meshes or multi-view captures as input, our approach reduces costs and enhances editing capabilities such as body shape adjustment, local texture modification, and geometry transfer. Our experimental results demonstrate superior quality across two datasets and emphasize the enhanced, user-friendly, and localized editing capabilities of our method.

Paper Structure

This paper contains 23 sections, 8 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Using a monocular video as input, we fit a set of features to generate an editable 3D avatar. Leveraging our specialized representation, GMA, this avatar can be easily edited in both texture and geometry by transferring features, while also supporting animation with hand poses and expressions. Our avatar model facilitates the transfer of garments across different identities and allows for the stamping of logos and other customizable elements in a user-friendly, interactive, real-time editing interface.
  • Figure 2: The framework of Avatar morphing. Starting with a standard SMPL-X mesh $\mathcal{S}$, our method first learns per-face features to generate a morphed mesh $\mathcal{S}_{\text{morph}}$ that captures the coarse geometry, which is morphed by the optimization of a coarse Gaussian $\mathcal{G}_{\text{coarse}}$. This morphed mesh then serves as a scaffold to place a dense set of fine-grained Gaussians $\mathcal{G}_{\text{fine}}$, which render the final detailed appearance. The resulting layered GMA representation allows for intuitive local editing and animation.
  • Figure 3: Direct edit with feature. Users can select specific faces through the interface to modify the corresponding features for localized editing on texture or geometry.
  • Figure 4: Novel pose compare on X-Human. We show the results for novel pose animation and garment transfer results. Our method produces high-quality reconstruction results and, compared to other methods, we can reduce the floating artifact in novel poses. Our method allows for editing of clothing style and texture.
  • Figure 5: Novel view compare on ZJU-mocap. As the results for novel view synthesis and local geometry edit result. Our method ensures a robust and consistent alignment of the garment with the underlying body motion. We also demonstrate its effectiveness by transferring the hood to another avatar in a novel pose.
  • ...and 9 more figures