Table of Contents
Fetching ...

Personalized Video Relighting With an At-Home Light Stage

Jun Myeong Choi, Max Christman, Roni Sengupta

TL;DR

A novel image-based neural relighting architecture that effectively separates the intrinsic appearance features of the face from the source lighting and then combines them with the target lighting to generate a relit image is developed.

Abstract

In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos under any pose, expression, and lighting condition in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or on actual light stage data which is difficult to acquire. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition. Our key contribution is a novel image-based neural relighting architecture that effectively separates the intrinsic appearance features - the geometry and reflectance of the face - from the source lighting and then combines them with the target lighting to generate a relit image. This neural architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured `Light Stage at Your Desk' (LSYD) and light-stage-captured `One Light At a Time' (OLAT) datasets.

Personalized Video Relighting With an At-Home Light Stage

TL;DR

A novel image-based neural relighting architecture that effectively separates the intrinsic appearance features of the face from the source lighting and then combines them with the target lighting to generate a relit image is developed.

Abstract

In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos under any pose, expression, and lighting condition in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or on actual light stage data which is difficult to acquire. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition. Our key contribution is a novel image-based neural relighting architecture that effectively separates the intrinsic appearance features - the geometry and reflectance of the face - from the source lighting and then combines them with the target lighting to generate a relit image. This neural architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured `Light Stage at Your Desk' (LSYD) and light-stage-captured `One Light At a Time' (OLAT) datasets.
Paper Structure (13 sections, 7 equations, 9 figures, 3 tables)

This paper contains 13 sections, 7 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We learn a personalized relighting algorithm that generates temporally consistent and high-quality portrait videos under different lighting. We train the network using recordings of users watching YouTube on a monitor, thereby creating a Light Stage at Your Desk (LSYD). We project a portion of the LDR environment map with a 180 $^{\circ}$ FoV as the monitor light, while a portion of the remaining 90 $^{\circ}$ FoV is mapped as the background. We can achieve a harmonization effect with the virtual background.
  • Figure 2: We highlight the key structural differences between our relighting architecture and that of GoogleRoni. Our approach removes source lighting information from input image features and only propagates intrinsic appearance (geometry and reflectance) features from the encoder to decoder, which results in better relighting quality and more temporal stability. In contrast GoogleRoni propagates entire image features from the encoder to the decoder without 'de-light', and expects the decoder to remove source lighting and add target lighting information.
  • Figure 3: We first de-light the input image features extracted by the U-Net encoder using Adaptive Instance Normalization (AdaIN) guided by the lighting features extracted from the source lighting with a Light Encoder. We then pass these light-normalized encoder features to the decoder of the U-Net and apply another set of AdaIN guided by the features extracted from the target lighting with the Light Encoder. We additionally predict source lighting from the U-Net encoder using a Light Decoder.
  • Figure 4: We perform a qualitative comparison with existing techniques RoniGoogle on the LSYD dataset. Source and target (images & lighting) were unseen during training. All models are personalized, i.e. trained on images of that individual only. We (Col. 3) produce significantly better results compared to existing approaches (Cols 4 and 5).
  • Figure 5: We perform a qualitative comparison on the OLAT dataset video_paper6. Our approach outperforms GoogleRoni and can render strong directional lighting and specular highlights without any explicit modeling of geometry and reflectance.
  • ...and 4 more figures