Table of Contents
Fetching ...

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li

TL;DR

DiffPortrait360 addresses 360$^\circ$ view-consistent head synthesis from a single portrait and enables 3D-aware NeRF rendering for diverse subjects. It extends DiffPortrait3D with a back-view generator, a dual-appearance module, a back-view reference, and a view-consistency training regime based on continuous view sequences, all built on a frozen latent diffusion backbone. The approach achieves robust, locally continuous 360-degree consistency across human, stylized, and anthropomorphic heads, outperforming state-of-the-art methods on stylized and real portraits. This capability supports immersive telepresence and scalable personalized content creation by producing high-quality 3D-aware assets from single images.

Abstract

Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views and struggle with view consistency, preventing their conversion into true 3D models for rendering from arbitrary angles. We introduce a novel approach that generates fully consistent 360-degree head views, accommodating human, stylized, and anthropomorphic forms, including accessories like glasses and hats. Our method builds on the DiffPortrait3D framework, incorporating a custom ControlNet for back-of-head detail generation and a dual appearance module to ensure global front-back consistency. By training on continuous view sequences and integrating a back reference image, our approach achieves robust, locally continuous view synthesis. Our model can be used to produce high-quality neural radiance fields (NeRFs) for real-time, free-viewpoint rendering, outperforming state-of-the-art methods in object synthesis and 360-degree head generation for very challenging input portraits.

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

TL;DR

DiffPortrait360 addresses 360 view-consistent head synthesis from a single portrait and enables 3D-aware NeRF rendering for diverse subjects. It extends DiffPortrait3D with a back-view generator, a dual-appearance module, a back-view reference, and a view-consistency training regime based on continuous view sequences, all built on a frozen latent diffusion backbone. The approach achieves robust, locally continuous 360-degree consistency across human, stylized, and anthropomorphic heads, outperforming state-of-the-art methods on stylized and real portraits. This capability supports immersive telepresence and scalable personalized content creation by producing high-quality 3D-aware assets from single images.

Abstract

Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views and struggle with view consistency, preventing their conversion into true 3D models for rendering from arbitrary angles. We introduce a novel approach that generates fully consistent 360-degree head views, accommodating human, stylized, and anthropomorphic forms, including accessories like glasses and hats. Our method builds on the DiffPortrait3D framework, incorporating a custom ControlNet for back-of-head detail generation and a dual appearance module to ensure global front-back consistency. By training on continuous view sequences and integrating a back reference image, our approach achieves robust, locally continuous view synthesis. Our model can be used to produce high-quality neural radiance fields (NeRFs) for real-time, free-viewpoint rendering, outperforming state-of-the-art methods in object synthesis and 360-degree head generation for very challenging input portraits.

Paper Structure

This paper contains 32 sections, 1 equation, 18 figures, 1 table.

Figures (18)

  • Figure 1: Our DiffPortrait360 enables 360$^\circ$ view-consistent full-head image synthesis. It is universally effective across a diverse range of facial portraits, allowing for the creation of 3D-aware head portraits from single-view images.
  • Figure 2: For the task of full-range 360-degree novel view synthesis, DiffPortrait360 employs a frozen pre-trained Latent Diffusion Model (LDM) as a rendering backbone and incorporates three auxiliary trainable modules for disentangled control of dual appearance $\mathcal{R}$, camera control $\mathcal{C}$, and U-Nets with view consistency $\mathcal{V}$. Specifically, $\mathcal{R}$ extracts appearance information from $I_{\text{ref}}$ and $I_{\text{back}}$, and $\mathcal{C}$ derives the camera pose, which is rendered using an off-the-shelf 3D GAN. During training, we utilize a continuous sampling training strategy to better preserve the continuity of the camera trajectory. We enhance attention to continuity between frames to maintain the appearance information without changes due to turning angles. For inference, we employ our tailored back-view image generation network $\mathcal{F}$ to generate a back-view image, enabling us to generate a 360-degree full range of camera trajectories using a single image portrait. Note that $z$ stands for latent space noise rather than image.
  • Figure 3: Qualitative comparisons with existing methods on in the wild portraits. Compared to the baselines, our method shows superior generalization capability to novel view synthesis of wild portraits with unseen appearances, expressions, and styles, even without any reliance on fine-tuning.
  • Figure 4: Qualitative comparisons of novel view synterhsis on RenderMe360 pan2024renderme. Our method achieves effective appearance control for novel synthesis under substantial change of camera view for synthesis.
  • Figure 5: Ablation Study on Dual Appearance Control.
  • ...and 13 more figures