Table of Contents
Fetching ...

Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On

Zhiyi Chen, Hsuan-I Ho, Tianjian Jiang, Jie Song, Manuel Kaufmann, Chen Guo

TL;DR

Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos and learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space to build avatars from multiple layers of free-form garments.

Abstract

We introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity. However, this paradigm fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different individuals. To overcome these problems, we develop a novel, compositional 3D Gaussian representation to build avatars from multiple layers of free-form garments. The core of our method is decomposing neural avatars into bodies and layers of shape-agnostic neural garments. To achieve this, our framework learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space. In experiments, our method models photorealistic avatars with high-fidelity dynamics, achieving new state-of-the-art performance on novel pose synthesis benchmarks. In addition, we demonstrate that the learned compositional garments contribute to a versatile digital wardrobe, enabling a practical virtual try-on application where clothing can be freely transferred to new subjects. Project page: https://ait.ethz.ch/gaussianwardrobe

Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On

TL;DR

Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos and learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space to build avatars from multiple layers of free-form garments.

Abstract

We introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity. However, this paradigm fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different individuals. To overcome these problems, we develop a novel, compositional 3D Gaussian representation to build avatars from multiple layers of free-form garments. The core of our method is decomposing neural avatars into bodies and layers of shape-agnostic neural garments. To achieve this, our framework learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space. In experiments, our method models photorealistic avatars with high-fidelity dynamics, achieving new state-of-the-art performance on novel pose synthesis benchmarks. In addition, we demonstrate that the learned compositional garments contribute to a versatile digital wardrobe, enabling a practical virtual try-on application where clothing can be freely transferred to new subjects. Project page: https://ait.ethz.ch/gaussianwardrobe
Paper Structure (41 sections, 11 equations, 11 figures, 4 tables)

This paper contains 41 sections, 11 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Gaussian Wardrobe digitalizes compositional neural avatars from multi-view videos. Our pipeline consists of two major components: (left) a compositional Gaussian representation and (right) a framework for learning neural garments. We first reconstruct a mesh template $\mathcal{M}$ from the first video frame and segment it into body $\mathcal{M}_b$, garment templates $\mathcal{M}_{\{u,\ell\}}$ in the zero-shaped canonical space. During training, each layer learns a separate U-Net $\mathcal{F}$ to predict the parameters of 3D Gaussian primitives $\mathbf{M}$ from pose-conditioned positional maps $\mathbf{P}$. We composite the 3D Gaussians $\mathcal{G}$ from all layers and render RGB images $\hat{I}$ and segmentation masks $\hat{S}$ to compute the training loss $\mathcal{L}$. The learned neural garments are shape-agnostic and can seamlessly transfer to other subjects for avatar virtual try-on.
  • Figure 2: Exemplar of avatar virtual try-on. Given a reconstructed avatar (left), we replace its lower garment with a new skirt, $\mathcal{M}_\ell'$. The combined avatar can be animated to a novel pose $\theta'$, which may introduce minor penetration artifacts (middle). We resolve these artifacts on-the-fly during rendering with our online penetration detection algorithm (right).
  • Figure 3: Qualitative comparisons of novel pose synthesis on 4D-Dress and ActorsHQ datasets. Our method can better model the dynamics of free-form garments, such as skirts (top) and vests (middle), and generate realistic renderings with sharper facial and garment details. In contrast, the baseline methods suffer from artifacts, such as blurry faces and semi-transparent clothing, and fail to reproduce fine details like wrinkles or pockets.
  • Figure 4: Visualization on the effectiveness of loss terms. We rendered segmentation masks of garment layers to visualize the impact of each regularization term. The evaluation shows that our full model produces cleanest results, while the absence of $\mathcal{L}_{\text{pe}}$, $\mathcal{L}_{\text{reg}}$ and $\mathcal{L}_{sg}$ leads to self-penetration or irregular segmentation.
  • Figure 5: Importance of segmentation loss in virtual try-on. We visualize the results of virtual try-on using the variants our method trained with and without $\mathcal{L}_{sg}$. The garment’s appearance becomes corrupted due to the entanglement of body and clothing.
  • ...and 6 more figures