EVA3D: Compositional 3D Human Generation from 2D Image Collections
Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu
TL;DR
Problem addressed: generating high-quality, animatable 3D humans from sparse 2D image collections is challenging due to articulation and pose/view diversity. The main approach, EVA3D, introduces a compositional NeRF with 16 body-part subnetworks and SMPL-based priors, plus a pose-guided sampling strategy to learn high-resolution 3D humans without 3D supervision. Key contributions include achieving native high-resolution ($512\times256$) 3D human generation from 2D data, an efficient compositional NeRF representation, delta SDF with SMPL guidance, strong quantitative/qualitative results across four fashion datasets, and capabilities for interpolation and inversion. This work advances inverse graphics for scalable, data-efficient 3D human synthesis with potential impact on AR/VR/VFX pipelines and downstream tasks.
Abstract
Inverse graphics aims to recover 3D models from 2D observations. Utilizing differentiable rendering, recent 3D-aware generative models have shown impressive results of rigid object generation using 2D images. However, it remains challenging to generate articulated objects, like human bodies, due to their complexity and diversity in poses and appearances. In this work, we propose, EVA3D, an unconditional 3D human generative model learned from 2D image collections only. EVA3D can sample 3D humans with detailed geometry and render high-quality images (up to 512x256) without bells and whistles (e.g. super resolution). At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts. Each part is represented by an individual volume. This compositional representation enables 1) inherent human priors, 2) adaptive allocation of network parameters, 3) efficient training and rendering. Moreover, to accommodate for the characteristics of sparse 2D human image collections (e.g. imbalanced pose distribution), we propose a pose-guided sampling strategy for better GAN learning. Extensive experiments validate that EVA3D achieves state-of-the-art 3D human generation performance regarding both geometry and texture quality. Notably, EVA3D demonstrates great potential and scalability to "inverse-graphics" diverse human bodies with a clean framework.
