FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image
Qiao Feng, Yuanwang Yang, Yebin Liu, Yu-Kun Lai, Jingyu Yang, Kun Li
TL;DR
This work tackles real-time monocular 3D human reconstruction by introducing Fourier Occupancy Field (FOF), which represents the 3D occupancy $F: [-1,1]^3 \to \{0,0.5,1\}$ as a 2D coefficient field via a truncated basis along the $z$-axis, enabling efficient CNN processing. To improve robustness and avoid Gibbs artifacts, FOF-X adopts a cosine-series formulation, leverages dual-sided normal maps and an SMPL prior, and incorporates robust inter-conversion between FOF and meshes using an automaton-based discontinuity matcher and a Laplacian coordinate constraint. The proposed pipeline delivers real-time performance (over 30 FPS, e.g., about $0.02$ s per frame on capable GPUs) and state-of-the-art accuracy on THuman2.1, CAPE, and CustomHumans, while remaining compatible with traditional mesh pipelines. This framework effectively bridges 2D image processing and 3D geometry, offering a scalable, cross-domain representation that supports robust, high-fidelity reconstruction from a single image. Future work includes extending the approach to perspective-camera setups, handling very thin structures, and exploring broader scene-level representations.
Abstract
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image. Balancing real-time speed against high-quality results is a persistent challenge, mainly due to the high computational demands of existing 3D representations. To address this, we propose Fourier Occupancy Field (FOF), an efficient 3D representation by learning the Fourier series. The core of FOF is to factorize a 3D occupancy field into a 2D vector field, retaining topology and spatial relationships within the 3D domain while facilitating compatibility with 2D convolutional neural networks. Such a representation bridges the gap between 3D and 2D domains, enabling the integration of human parametric models as priors and enhancing the reconstruction robustness. Based on FOF, we design a new reconstruction framework, FOF-X, to avoid the performance degradation caused by texture and lighting. This enables our real-time reconstruction system to better handle the domain gap between training images and real images. Additionally, in FOF-X, we enhance the inter-conversion algorithms between FOF and mesh representations with a Laplacian constraint and an automaton-based discontinuity matcher, improving both quality and robustness. We validate the strengths of our approach on different datasets and real-captured data, where FOF-X achieves new state-of-the-art results. The code has already been released for research purposes at https://cic.tju.edu.cn/faculty/likun/projects/FOFX/index.html.
