Table of Contents
Fetching ...

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

TL;DR

Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos, and improves the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively.

Abstract

Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

TL;DR

Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos, and improves the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively.

Abstract

Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.
Paper Structure (54 sections, 42 equations, 9 figures, 6 tables)

This paper contains 54 sections, 42 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: MOSS reconstructs 3D clothed humans with detailed joints and fine clothing folds. The right image demonstrates that MOSS surpasses the visual quality of previous works on MonoCap. (LPIPS* = LPIPS × 10$^{3}$). Larger circles denote higher FPS.
  • Figure 2: MOSS framework.Moss conditions the Fisher distribution of child joints on the Fisher distribution of their parent joints within the kinematic hierarchy tree, thereby linking the rotational matrices of each joint to the global motion by Joint-Driven Orientation Refinement. The UID is employed to detect and locate areas with numerous surface folds on the human body. In these regions, the Gaussians are scaled by the axial matrix from the SVD of the Fisher and rotated by the directional matrix predicted from the Fisher using KGAS. The T-pose is then converted to the target pose, and the surface folds are refined accordingly.
  • Figure 3: Fisher's Gaussian Sampling. This is a 2D example of how spindle concentration affects Gaussian sampling. note that the color bar represents the probability of sampling, with darker colors representing higher probabilities.
  • Figure 4: Solving occlusion problems with UID (2D). There is a potential problem of smaller folds being occluded by obvious folds due to the viewing angle. By calculating the degree of directional change in the local distribution of Gaussians, the regions with large deformation on the surface are localized and densely processed.
  • Figure 5: To ensure a fair comparison, we compare NeuralBody_ZJU-MoCapHumanNeRFAnimateNeRFInstantNVRHu2023GauHumanAGLi2023Human101T1 at 512 $\times$ 512 resolution. Our model shows better visual quality and more detail.
  • ...and 4 more figures