Table of Contents
Fetching ...

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang, Zeshuai Deng, Zixiong Huang, Mingkui Tan

TL;DR

HiLo addresses the challenge of reconstructing detailed and robust 3D clothed humans from a single RGB image by exploiting high-frequency information from the naked-body SDF and low-frequency information from a voxelized SMPL-X model. It introduces a progressive high-frequency SDF $\mathcal{H}(s;\beta)$ to capture fine geometry and a spatial interaction implicit function $\phi_{si}$ to leverage LF voxel cues, yielding detailed geometry and robustness to SMPL-X noise. Across Thuman2.0, CAPE, and in-the-wild images, HiLo outperforms state-of-the-art methods in Chamfer distance and P2S while converging faster, demonstrating practical utility for virtual try-on, movies, and games. The approach provides a principled HF/LF fusion framework with practical efficiency and broad potential extensions to 3D reconstruction tasks beyond clothed humans.

Abstract

Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles.

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

TL;DR

HiLo addresses the challenge of reconstructing detailed and robust 3D clothed humans from a single RGB image by exploiting high-frequency information from the naked-body SDF and low-frequency information from a voxelized SMPL-X model. It introduces a progressive high-frequency SDF to capture fine geometry and a spatial interaction implicit function to leverage LF voxel cues, yielding detailed geometry and robustness to SMPL-X noise. Across Thuman2.0, CAPE, and in-the-wild images, HiLo outperforms state-of-the-art methods in Chamfer distance and P2S while converging faster, demonstrating practical utility for virtual try-on, movies, and games. The approach provides a principled HF/LF fusion framework with practical efficiency and broad potential extensions to 3D reconstruction tasks beyond clothed humans.

Abstract

Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles.
Paper Structure (34 sections, 14 equations, 21 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 14 equations, 21 figures, 7 tables, 1 algorithm.

Figures (21)

  • Figure 1: Visualization comparisons on in-the-wild images, our HiLo achieves more accurate and detailed reconstruction on challenging poses and diverse clothes.
  • Figure 2: We empirically demonstrate the effectiveness of the high-frequency (HF) regularization from naked bodies in enhancing geometry details in Toy Experiment . We also verify the effectiveness of the low-frequency (LF) regularization in improving robustness to noise in Toy Experiment .
  • Figure 3: Overview of our proposed HiLo. Conditioned on a single-view image $\mathcal{I}$ and the corresponding SMPL-X $\mathcal{M}$, we first prepare a signed distance field $s$ and a low-resolution voxel grid $\mathcal{M}_v^{3D}$ of the naked body. Then, our proposed progressive high-frequency signed distance function $\mathcal{H}(s;\beta)$ enhances $s$ for detailed geometry of the clothed human and alleviates convergence difficulties introduced by large gradients in a coarse-to-fine learning manner. Moreover, we design an implicit function $\phi_{si}$ which leverages the complementary information of low-frequency voxels from $\mathcal{M}_v^{3D}$ to mitigate various levels of noise. Finally, we combine the above HF and LF features to $\phi_{si}$ to infer the occupancy field $\mathcal{\hat{O}}$ of the clothed avatar.
  • Figure 4: Illustration of the relationship between progressive weights $\omega$ and $\beta$ during the training process.
  • Figure 5: (a) Complementarity of voxel and . (b) Illustration of the spatial interaction module $\mathcal{A}$.
  • ...and 16 more figures