Table of Contents
Fetching ...

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu

TL;DR

HumanGaussian addresses the challenge of text-driven 3D human generation by integrating 3D Gaussian Splatting with structure-aware guidance. The approach initializes Gaussians on an SMPL-X surface and uses a dual-branch diffusion model to jointly learn texture and structure, augmented by an annealed negative prompt strategy to avoid over-saturation. A prune-only phase further removes artifacts, yielding efficient yet high-quality geometry and appearance. Across qualitative and user-study evaluations, the method demonstrates competitive visual fidelity and improved efficiency relative to existing text-to-3D human baselines, paving the way for scalable, controllable 3D human generation from text prompts.

Abstract

Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. Project Page: https://alvinliu0.github.io/projects/HumanGaussian

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

TL;DR

HumanGaussian addresses the challenge of text-driven 3D human generation by integrating 3D Gaussian Splatting with structure-aware guidance. The approach initializes Gaussians on an SMPL-X surface and uses a dual-branch diffusion model to jointly learn texture and structure, augmented by an annealed negative prompt strategy to avoid over-saturation. A prune-only phase further removes artifacts, yielding efficient yet high-quality geometry and appearance. Across qualitative and user-study evaluations, the method demonstrates competitive visual fidelity and improved efficiency relative to existing text-to-3D human baselines, paving the way for scalable, controllable 3D human generation from text prompts.

Abstract

Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. Project Page: https://alvinliu0.github.io/projects/HumanGaussian
Paper Structure (11 sections, 9 equations, 4 figures, 2 tables)

This paper contains 11 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: We propose HumanGaussian, an efficient yet effective framework that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our method adapts 3D Gaussian Splatting into text-driven 3D human generation with novel designs.
  • Figure 2: Overview of the proposed HumanGaussian Framework. We generate high-quality 3D humans from text prompts with the neural representation of 3D Gaussian Splatting (3DGS). In Structure-Aware SDS, we start from the SMPL-X prior to densely sample Gaussians on the human mesh surface as initial center positions. Then, a Texture-Structure Joint Model is trained to simultaneously denoise the image $\mathbf{x}$ and depth $\mathbf{d}$ conditioned on pose skeleton $\mathbf{p}$. Based on this, we design a dual-branch SDS to jointly optimize human appearance and geometry, where the 3DGS density is adaptively controlled by distilling from both the RGB and depth space. In Annealed Negative Prompt Guidance, we use the cleaner classifier score with an annealed negative score to regularize the stochastic SDS gradient of high variance. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness.
  • Figure 3: Visual Comparisons with Text-to-3D and 3D Human Models. We compare with recent state-of-the-art baselines on five different prompts, each showing two camera views. Note that the textural unrealism and blurriness are highlighted with yellow arrows; the geometric artifacts are highlighted with green rectangles. Please kindly zoom in for best view and refer to demo video for more results.
  • Figure 4: Ablation Studies on HumanGaussian Module Design. We present generation results of the human frontal view under five ablation settings for better visualization and comparisons: (A)baseline; (B)+SMPL-X, Pose-Cond.; (C)+Neg. Guidance, CFG=$7.5$; (D)+Dual-Branch SDS; (E)+Size-based Prune. The detailed ablation setting designs and result analysis are elaborated in Sec. \ref{['sec:ablation']}.