Table of Contents
Fetching ...

Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion

Haoran Wei, Wencheng Han, Xingping Dong, Jianbing Shen

TL;DR

This paper proposes a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits.

Abstract

Recent diffusion-based Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. However, these methods usually struggle to produce high-fidelity 3D models, frequently yielding excessively blurred textures. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views and ultimately leading to blurred 3D representations. In this paper, we address this issue by comprehensively exploiting multi-view priors in both the conditioning and diffusion procedures to produce consistent, detail-rich portraits. From the conditioning standpoint, we propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. From the diffusion perspective, considering the significant impact of the diffusion noise distribution on detailed texture generation, we propose a Multi-View Noise Resamplig Strategy integrated within the optimization process leveraging cross-view priors to enhance representation consistency. Extensive experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image. The project page is at \url{https://haoran-wei.github.io/Portrait-Diffusion}.

Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion

TL;DR

This paper proposes a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits.

Abstract

Recent diffusion-based Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. However, these methods usually struggle to produce high-fidelity 3D models, frequently yielding excessively blurred textures. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views and ultimately leading to blurred 3D representations. In this paper, we address this issue by comprehensively exploiting multi-view priors in both the conditioning and diffusion procedures to produce consistent, detail-rich portraits. From the conditioning standpoint, we propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. From the diffusion perspective, considering the significant impact of the diffusion noise distribution on detailed texture generation, we propose a Multi-View Noise Resamplig Strategy integrated within the optimization process leveraging cross-view priors to enhance representation consistency. Extensive experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image. The project page is at \url{https://haoran-wei.github.io/Portrait-Diffusion}.

Paper Structure

This paper contains 13 sections, 19 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The Portrait Diffusion Framework. This framework comprises three integral modules. GAN-prior Portrait Initialization, employs existing Portrait GAN priors to derive initial tri-plane NeRF features from frontal-view portrait images. Portrait Geometry Restoration, is focused on reconstructing the geometry using these initialized tri-planes. Multi-view Diffusion Texture Refinement, transforms coarse textures into detailed representations.
  • Figure 2: The presentations of our proposed Hybrid Priors Portrait Diffusion model (a) and Multi-View Noise Resampling Strategy (b). HPDM is designed to leverage various multi-view priors in a hybrid manner to condition the new view synthetic process for more consistent status. NV-NRS is designed to transfer corss-view priors to control the diffusion noise distribution for representations alignment.
  • Figure 3: Qualitative comparison to SOTA approaches: Portrait3D wu2024portrait3d, Wonder3D long2024wonder3d, and DreamCraft3D sun2023dreamcraft3d. Our method presents the most photorealistic 3D portraits, with the most detailed textures in the face and hair strands. Zoom in for more detailed insights.
  • Figure 4: Visual Results for Ablation study on Hybrid Priors Diffusion Model.
  • Figure 5: Visual Results for Ablation study on Multi-View Noise Resampling Strategy.