HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation
Panwang Pan, Tingting Shen, Chenxin Li, Yunlong Lin, Kairun Wen, Jingjing Zhao, Yixuan Yuan
TL;DR
HumanCrafter tackles the challenge of simultaneous 3D human reconstruction and body-part segmentation from a single image. It introduces a feed-forward pipeline that converts aggregated multi-view features into pixel-aligned 3D Gaussian Primitives, with a second transformer producing semantic 3D Gaussians and a differentiable renderer. The model leverages human priors (e.g., SMPL, Plücker embeddings) and diffusion-based appearance priors, enabling cross-task learning with a joint render-distillation-segmentation objective, achieving state-of-the-art results in both 3D segmentation and single-image 3D reconstruction while running in real time. These capabilities enable practical applications in AR/VR, editing, and immersive exploration, while the work also discusses ethical considerations and future directions.
Abstract
Recent advances in generative models have achieved high-fidelity in 3D human reconstruction, yet their utility for specific tasks (e.g., human 3D segmentation) remains constrained. We propose HumanCrafter, a unified framework that enables the joint modeling of appearance and human-part semantics from a single image in a feed-forward manner. Specifically, we integrate human geometric priors in the reconstruction stage and self-supervised semantic priors in the segmentation stage. To address labeled 3D human datasets scarcity, we further develop an interactive annotation procedure for generating high-quality data-label pairs. Our pixel-aligned aggregation enables cross-task synergy, while the multi-task objective simultaneously optimizes texture modeling fidelity and semantic consistency. Extensive experiments demonstrate that HumanCrafter surpasses existing state-of-the-art methods in both 3D human-part segmentation and 3D human reconstruction from a single image.
