Table of Contents
Fetching ...

You Only Need One Stage: Novel-View Synthesis From A Single Blind Face Image

Taoyue Wang, Xiang Zhang, Xiaotian Li, Huiyuan Yang, Lijun Yin

TL;DR

A novel one-stage method for generating consistent Novel-View images directly from a single Blind Face image, NVB-Face, that Leveraging the powerful generative capacity of a diffusion model, synthesizes high-quality, consistent novel-view face images.

Abstract

We propose a novel one-stage method, NVB-Face, for generating consistent Novel-View images directly from a single Blind Face image. Existing approaches to novel-view synthesis for objects or faces typically require a high-resolution RGB image as input. When dealing with degraded images, the conventional pipeline follows a two-stage process: first restoring the image to high resolution, then synthesizing novel views from the restored result. However, this approach is highly dependent on the quality of the restored image, often leading to inaccuracies and inconsistencies in the final output. To address this limitation, we extract single-view features directly from the blind face image and introduce a feature manipulator that transforms these features into 3D-aware, multi-view latent representations. Leveraging the powerful generative capacity of a diffusion model, our framework synthesizes high-quality, consistent novel-view face images. Experimental results show that our method significantly outperforms traditional two-stage approaches in both consistency and fidelity.

You Only Need One Stage: Novel-View Synthesis From A Single Blind Face Image

TL;DR

A novel one-stage method for generating consistent Novel-View images directly from a single Blind Face image, NVB-Face, that Leveraging the powerful generative capacity of a diffusion model, synthesizes high-quality, consistent novel-view face images.

Abstract

We propose a novel one-stage method, NVB-Face, for generating consistent Novel-View images directly from a single Blind Face image. Existing approaches to novel-view synthesis for objects or faces typically require a high-resolution RGB image as input. When dealing with degraded images, the conventional pipeline follows a two-stage process: first restoring the image to high resolution, then synthesizing novel views from the restored result. However, this approach is highly dependent on the quality of the restored image, often leading to inaccuracies and inconsistencies in the final output. To address this limitation, we extract single-view features directly from the blind face image and introduce a feature manipulator that transforms these features into 3D-aware, multi-view latent representations. Leveraging the powerful generative capacity of a diffusion model, our framework synthesizes high-quality, consistent novel-view face images. Experimental results show that our method significantly outperforms traditional two-stage approaches in both consistency and fidelity.
Paper Structure (27 sections, 9 equations, 11 figures, 3 tables)

This paper contains 27 sections, 9 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: We compare our method with typical two-stage pipelines, such as CodeFormer zhou2022towards + PanoHead-PTI an2023panohead, which first restore the degraded image and then synthesize novel views. It is evident that when the restoration stage fails to recover accurate details, these errors are further amplified during the novel view synthesis, leading to results that deviate significantly from the original identity and appearance. In contrast, our method generates novel views in a single stage directly from the low-quality input. This end-to-end design suppresses error accumulation, resulting in more reliable and faithful novel-view images.
  • Figure 2: An overview of the proposed NVB-Face architecture. (a) Our first training step focuses solely image restoration. (b) In the second training step, we update only the parameters of the newly introduced modules (highlighted in dark green), keeping the rest of the network frozen. After training, this two-step process forms our complete inference pipeline.
  • Figure 3: Qualitative comparisons on NeRSemble kirschstein2023nersemble dataset. As shown in our results, this end-to-end strategy achieves superior perceptual quality and preserves identity and expression information more effectively than two-stage methods, minimizing the loss of critical facial attributes.
  • Figure 4: Qualitative comparisons on LFW-Test huang2008labeled dataset. Our method produces consistently stable results across varying levels of input degradation. Compared to other approaches, our generated images preserve the most information from the original input and exhibit higher visual realism, even under severe degradation.
  • Figure 5: Qualitative ablation study on LFW-Test dataset to compare our method with and without feature loss.
  • ...and 6 more figures