Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu
TL;DR
Disco4D tackles the problem of generating and animating 3D clothed humans from a single image by introducing a disentangled representation in which clothing is modeled with external Gaussians bound to a fixed SMPL-X body. It combines Gaussian Splatting with diffusion-based texture refinement and a per-Gaussian identity encoding to enable fine-grained clothing editing and robust 4D animation. The method supports both pose-driven animation and data-driven clothing dynamics, delivering improved geometry and temporal consistency compared with prior single-image approaches. Extensive experiments on 3D generation and 4D reconstruction demonstrate clear gains over state-of-the-art baselines and ablations validate key design choices, while acknowledging limitations in SMPL-X estimation and visual hull quality that guide future work.
Abstract
We present \textbf{Disco4D}, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. \textbf{1)} Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. \textbf{2)} It adopts diffusion models to enhance the 3D generation process, \textit{e.g.}, modeling occluded parts not visible in the input image. \textbf{3)} It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \url{https://disco-4d.github.io/}.
