Table of Contents
Fetching ...

Lite2Relight: 3D-aware Single Image Portrait Relighting

Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

TL;DR

Lite2Relight addresses the challenge of photorealistic 3D portrait relighting from a single image by integrating a lightstage-supervised, 3D-aware EG3D prior with an encoder-based inversion and a relighting network. It introduces Adaptive Feature Alignment to fuse inverted latent codes with target illumination, enabling 3D-consistent pose synthesis and physically plausible lighting under HDRI maps at interactive frame rates. The approach demonstrates superior generalization to in-the-wild portraits, preserving identity details (eyes, expression, accessories) across novel viewpoints and illumination, and outperforms state-of-the-art methods on metrics like SSIM, LD, and PSNR. This work advances interactive, high-fidelity portrait editing for AR/VR, offering robust relighting and 3D editing without heavy optimization, with code and pretrained models released publicly.

Abstract

Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to achieve and often requires multiview images during the testing phase or involves slow optimization processes. This paper introduces Lite2Relight, a novel technique that can predict 3D consistent head poses of portraits while performing physically plausible light editing at interactive speed. Our method uniquely extends the generative capabilities and efficient volumetric representation of EG3D, leveraging a lightstage dataset to implicitly disentangle face reflectance and perform relighting under target HDRI environment maps. By utilizing a pre-trained geometry-aware encoder and a feature alignment module, we map input images into a relightable 3D space, enhancing them with a strong face geometry and reflectance prior. Through extensive quantitative and qualitative evaluations, we show that our method outperforms the state-of-the-art methods in terms of efficacy, photorealism, and practical application. This includes producing 3D-consistent results of the full head, including hair, eyes, and expressions. Lite2Relight paves the way for large-scale adoption of photorealistic portrait editing in various domains, offering a robust, interactive solution to a previously constrained problem. Project page: https://vcai.mpi-inf.mpg.de/projects/Lite2Relight/

Lite2Relight: 3D-aware Single Image Portrait Relighting

TL;DR

Lite2Relight addresses the challenge of photorealistic 3D portrait relighting from a single image by integrating a lightstage-supervised, 3D-aware EG3D prior with an encoder-based inversion and a relighting network. It introduces Adaptive Feature Alignment to fuse inverted latent codes with target illumination, enabling 3D-consistent pose synthesis and physically plausible lighting under HDRI maps at interactive frame rates. The approach demonstrates superior generalization to in-the-wild portraits, preserving identity details (eyes, expression, accessories) across novel viewpoints and illumination, and outperforms state-of-the-art methods on metrics like SSIM, LD, and PSNR. This work advances interactive, high-fidelity portrait editing for AR/VR, offering robust relighting and 3D editing without heavy optimization, with code and pretrained models released publicly.

Abstract

Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to achieve and often requires multiview images during the testing phase or involves slow optimization processes. This paper introduces Lite2Relight, a novel technique that can predict 3D consistent head poses of portraits while performing physically plausible light editing at interactive speed. Our method uniquely extends the generative capabilities and efficient volumetric representation of EG3D, leveraging a lightstage dataset to implicitly disentangle face reflectance and perform relighting under target HDRI environment maps. By utilizing a pre-trained geometry-aware encoder and a feature alignment module, we map input images into a relightable 3D space, enhancing them with a strong face geometry and reflectance prior. Through extensive quantitative and qualitative evaluations, we show that our method outperforms the state-of-the-art methods in terms of efficacy, photorealism, and practical application. This includes producing 3D-consistent results of the full head, including hair, eyes, and expressions. Lite2Relight paves the way for large-scale adoption of photorealistic portrait editing in various domains, offering a robust, interactive solution to a previously constrained problem. Project page: https://vcai.mpi-inf.mpg.de/projects/Lite2Relight/
Paper Structure (27 sections, 12 equations, 9 figures, 3 tables)

This paper contains 27 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Method Overview. (a) Given an input image $I_{s}$, we use a pretrained encoder $\mathcal{E}$ to invert $I_{s}$ and obtain the latent vector ${w}^+_{s}$. We pass ${w}^+_{s}$ through a pretrained EG3D network to render the inverted image $I_{w^+}$ and extract convolutional features $G^k_{s}$ from $\mathcal{G}_{sg}$. (b) Next, we use image residual $\Delta{I}$ and $G^k_{s}$ as inputs to the AFA module, to obtain $F_{s}$. (c) Given a target environment map $E_{t}$, our relighting network $\mathcal{R}$ generates $\Delta{w}$, which is combined with ${w}^+_{s}$ to produce the relit latent code $\hat{w}^+_{t}$. (d) Subsequently, we obtain $F_{t}$ by following \ref{['eq:feat_manipulate']}. (e) Finally, we replace the $k$-th convolutional feature of $\mathcal{G}_{sg}$ by $F_{t}$ and perform a full forward pass through the EG3D network with the latent code $\hat{w}^+_{t}$ to generate $\hat{I_{t}}$, which is relit by $E_{t}$. Note: $\mathcal{G}_{dec}$ takes camera pose $c$ as input.
  • Figure 2: Qualitative Results: Relighting in-the-wild Portraits. col. (column) 1: input in-the-wild image, col. 2: image relit with HDRI environment maps (inset) ($E_1$) under the same viewpoint. col. 3: Novel View (NV) 1 with a different environment map ($E_2$). col. 4 and 5: NV 2 and 3 under the same map. This figure demonstrates that Lite2Relight can generalize robustly to in-the-wild images, preserve subject-specific face semantics and perform relighting under various environment maps simultaneously. Image credits to Flickr.
  • Figure 3: Qualitative Results: Comparisons to Previous Works. We compare with NFL nerffacelighting, VoRF prao2022vorfprao2023vorf and PhotoApp mallikarjun2021photoapp.For each method, including ours, a single input view is utilized to generate novel views alongside relighting of the lightstage subjects. In comparison to the leading state-of-the-art techniques, Lite2Relight demonstrates superior ability in maintaining subject identity and capturing finer details.
  • Figure 4: Qualitative Results: Relighting in-the-wild Portraits. col. (column) 1: input in-the-wild image, col. 2: image relit with HDRI environment maps (inset) ($E_1$) under the same viewpoint. col. 3: Novel View (NV) 1 with a different environment map ($E_2$). col. 4 and 5: NV 2 and 3 under the same map. We show additional results to demonstrate generalization, 3D consistent pose of subjects, and relighting results of Lite2Relight for in-the-wild images. Image credits to Steven R. Livingstone and Flickr.
  • Figure 5: Qualitative Results: Comparisons with PhotoApp mallikarjun2021photoapp. We conduct comparisons using the H3DS dataset ramon2021h3dcaselles2023implicit. The first column displays the input images, followed by three columns showcasing novel view synthesis results under the same environment map. This comparison highlights that PhotoApp fails to retain identity-specific details as effectively as our method. Notably, the subject's eyes and nose region appear altered across different views in PhotoApp, whereas it remains consistent and true to the input in Lite2Relight . Image credits to Pol Caselles.
  • ...and 4 more figures