HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
Yifan Yang, Dong Liu, Shuhai Zhang, Zeshuai Deng, Zixiong Huang, Mingkui Tan
TL;DR
HiLo addresses the challenge of reconstructing detailed and robust 3D clothed humans from a single RGB image by exploiting high-frequency information from the naked-body SDF and low-frequency information from a voxelized SMPL-X model. It introduces a progressive high-frequency SDF $\mathcal{H}(s;\beta)$ to capture fine geometry and a spatial interaction implicit function $\phi_{si}$ to leverage LF voxel cues, yielding detailed geometry and robustness to SMPL-X noise. Across Thuman2.0, CAPE, and in-the-wild images, HiLo outperforms state-of-the-art methods in Chamfer distance and P2S while converging faster, demonstrating practical utility for virtual try-on, movies, and games. The approach provides a principled HF/LF fusion framework with practical efficiency and broad potential extensions to 3D reconstruction tasks beyond clothed humans.
Abstract
Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles.
