Table of Contents
Fetching ...

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu

TL;DR

Disco4D tackles the problem of generating and animating 3D clothed humans from a single image by introducing a disentangled representation in which clothing is modeled with external Gaussians bound to a fixed SMPL-X body. It combines Gaussian Splatting with diffusion-based texture refinement and a per-Gaussian identity encoding to enable fine-grained clothing editing and robust 4D animation. The method supports both pose-driven animation and data-driven clothing dynamics, delivering improved geometry and temporal consistency compared with prior single-image approaches. Extensive experiments on 3D generation and 4D reconstruction demonstrate clear gains over state-of-the-art baselines and ablations validate key design choices, while acknowledging limitations in SMPL-X estimation and visual hull quality that guide future work.

Abstract

We present \textbf{Disco4D}, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. \textbf{1)} Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. \textbf{2)} It adopts diffusion models to enhance the 3D generation process, \textit{e.g.}, modeling occluded parts not visible in the input image. \textbf{3)} It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \url{https://disco-4d.github.io/}.

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

TL;DR

Disco4D tackles the problem of generating and animating 3D clothed humans from a single image by introducing a disentangled representation in which clothing is modeled with external Gaussians bound to a fixed SMPL-X body. It combines Gaussian Splatting with diffusion-based texture refinement and a per-Gaussian identity encoding to enable fine-grained clothing editing and robust 4D animation. The method supports both pose-driven animation and data-driven clothing dynamics, delivering improved geometry and temporal consistency compared with prior single-image approaches. Extensive experiments on 3D generation and 4D reconstruction demonstrate clear gains over state-of-the-art baselines and ablations validate key design choices, while acknowledging limitations in SMPL-X estimation and visual hull quality that guide future work.

Abstract

We present \textbf{Disco4D}, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. \textbf{1)} Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. \textbf{2)} It adopts diffusion models to enhance the 3D generation process, \textit{e.g.}, modeling occluded parts not visible in the input image. \textbf{3)} It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \url{https://disco-4d.github.io/}.
Paper Structure (23 sections, 7 equations, 10 figures, 5 tables)

This paper contains 23 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Disco4D is a novel Gaussian Splatting framework for 4D disentangled human generation, animation and editing from a single image.
  • Figure 2: Overview of Disco4D.(a) 3D Generation utilizes a single image to obtain disentangled body and clothing Gaussians. Body, face and hand poses are refined to be pixel-aligned. For faster initialization, clothing Gaussians and visual hull are obtained with Gaussian Reconstruction Models. These clothing Gaussians are embedded to SMPL-X mesh and adopt the local coordinate system of the triangle. Subsequently, the iterative optimization process (pruning, identity encoding and densifying) separates the body and garments. The learned identity encodings guide the densification of the clothing Gaussians. (b) 4D Animation are achieved by either direct driving of SMPL-X poses or leveraging video to learn extra clothing deformation. Given a driving video, we first obtain a static 3D Disentangled GS model. Body and clothing Gaussians are deformed by pose transformations. We then optimize a deformation network to learn extra deformations for clothing GS at different timestamps. Various (c) 3D/4D Editing operations can be performed with our disentangled representation.
  • Figure 3: Qualitative comparison of Image generation across DreamGaussian, LGM, SHERF, and Disco4D.
  • Figure 4: Qualitative comparison of 4D generation between DreamGaussian4D, MonoHuman, GART, GaussianAvatar, and Disco4D.
  • Figure 5: 4D reconstruction results on 4D-Dress Dataset.
  • ...and 5 more figures