GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

Zichen Tang; Yuan Yao; Miaomiao Cui; Liefeng Bo; Hongyu Yang

GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

Zichen Tang, Yuan Yao, Miaomiao Cui, Liefeng Bo, Hongyu Yang

TL;DR

This work addresses the challenge of generating identity-preserving, realistic 3D human avatars from text and image prompts. It introduces GaussianIP, a two-stage framework that couples 3D Gaussian Splatting with a human-centric diffusion prior. The first stage employs Adaptive Human Distillation Sampling (AHDS) to efficiently distill identity-relevant cues, while the second stage uses View-Consistent Refinement (VCR) to enhance facial and garment details with cross-view texture coherence. Empirical results show improved visual quality and faster training compared to state-of-the-art baselines, highlighting the practical impact for AR/VR and personalized digital humans; limitations suggest extending the approach to more complex poses and interactions in future work.

Abstract

Text-guided 3D human generation has advanced with the development of efficient 3D representations and 2D-lifting methods like Score Distillation Sampling (SDS). However, current methods suffer from prolonged training times and often produce results that lack fine facial and garment details. In this paper, we propose GaussianIP, an effective two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts. Our core insight is to leverage human-centric knowledge to facilitate the generation process. In stage 1, we propose a novel Adaptive Human Distillation Sampling (AHDS) method to rapidly generate a 3D human that maintains high identity consistency with the image prompt and achieves a realistic appearance. Compared to traditional SDS methods, AHDS better aligns with the human-centric generation process, enhancing visual quality with notably fewer training steps. To further improve the visual quality of the face and clothes regions, we design a View-Consistent Refinement (VCR) strategy in stage 2. Specifically, it produces detail-enhanced results of the multi-view images from stage 1 iteratively, ensuring the 3D texture consistency across views via mutual attention and distance-guided attention fusion. Then a polished version of the 3D human can be achieved by directly perform reconstruction with the refined images. Extensive experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency, particularly in generating identity-preserving results. Our code is available at: https://github.com/silence-tang/GaussianIP.

GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

TL;DR

Abstract

GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)