HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model
Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-kun Lai, Kun Li
TL;DR
This work addresses the challenge of generating editable, physically-layered 3D humans from text prompts by introducing a physically-decoupled diffusion framework. It combines a two-stage pipeline: (i) canonical body generation via NeRF controlled by SMPL and ControlNet, and (ii) dual-representation decoupling (DRD) for clothing with multi-layer fusion rendering, plus an SMPL-powered implicit deformation network (SID Net) to align garments with varying body shapes. Key contributions include the DRD for disentangled clothing semantics, a multi-layer rendering strategy for layering, and the SID Net for accurate garment–body matching, enabling clothing transfer across identities and pose-driven animation. The approach achieves state-of-the-art layered 3D human generation, supports virtual try-on and layered animation, and offers practical benefits for digital avatars, fashion, and entertainment, with future work on more precise deformation proxies and collision-aware optimization. The technical core relies on loss terms such as $L_{body}$, $L_{SDS}$, $L_n$, $L_n^{reg}$, $\mathcal{L}_{reg\_ds}$, $\mathcal{L}_r$, $\mathcal{L}_{match}$, and $\mathcal{L}_{offset}^{reg}$ to drive accurate, decoupled garment generation and fitting.$
Abstract
This paper aims to generate physically-layered 3D humans from text prompts. Existing methods either generate 3D clothed humans as a whole or support only tight and simple clothing generation, which limits their applications to virtual try-on and part-level editing. To achieve physically-layered 3D human generation with reusable and complex clothing, we propose a novel layer-wise dressed human representation based on a physically-decoupled diffusion model. Specifically, to achieve layer-wise clothing generation, we propose a dual-representation decoupling framework for generating clothing decoupled from the human body, in conjunction with an innovative multi-layer fusion volume rendering method. To match the clothing with different body shapes, we propose an SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Extensive experiments demonstrate that our approach not only achieves state-of-the-art layered 3D human generation with complex clothing but also supports virtual try-on and layered human animation.
