Table of Contents
Fetching ...

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, Sanghyun Woo

TL;DR

SwitchLight tackles portrait relighting as an ill-posed problem by jointly designing a physics-driven architecture and a self-supervised pre-training strategy. It replaces the empirical Phong model in prior work with the Cook-Torrance BRDF and introduces Multi-Masked Autoencoder (MMAE) pre-training to scale training without heavy light-stage data. The framework decomposes inputs into normals, albedo, roughness, and lighting, and renders target-lit images via a two-stage inverse rendering and re-rendering pipeline, augmented by neural refinement. Empirical results on OLAT and FFHQ-based studies show improved realism, skin and hair detail, and consistent lighting, highlighting strong potential for VR/AR content creation and beyond-image applications.

Abstract

We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework. Drawing on the Cook-Torrance reflectance model, we have meticulously configured the architecture design to precisely simulate light-surface interactions. Furthermore, to overcome the limitation of scarce high-quality lightstage data, we have developed a self-supervised pre-training strategy. This novel combination of accurate physical modeling and expanded training dataset establishes a new benchmark in relighting realism.

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

TL;DR

SwitchLight tackles portrait relighting as an ill-posed problem by jointly designing a physics-driven architecture and a self-supervised pre-training strategy. It replaces the empirical Phong model in prior work with the Cook-Torrance BRDF and introduces Multi-Masked Autoencoder (MMAE) pre-training to scale training without heavy light-stage data. The framework decomposes inputs into normals, albedo, roughness, and lighting, and renders target-lit images via a two-stage inverse rendering and re-rendering pipeline, augmented by neural refinement. Empirical results on OLAT and FFHQ-based studies show improved realism, skin and hair detail, and consistent lighting, highlighting strong potential for VR/AR content creation and beyond-image applications.

Abstract

We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework. Drawing on the Cook-Torrance reflectance model, we have meticulously configured the architecture design to precisely simulate light-surface interactions. Furthermore, to overcome the limitation of scarce high-quality lightstage data, we have developed a self-supervised pre-training strategy. This novel combination of accurate physical modeling and expanded training dataset establishes a new benchmark in relighting realism.
Paper Structure (31 sections, 12 equations, 16 figures, 4 tables)

This paper contains 31 sections, 12 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Be Anywhere at Any Time. SwitchLight processes a human portrait by decomposing it into detailed intrinsic components, and re-renders the image under a designated target illumination, ensuring a seamless composition of the subject into any new environment.
  • Figure 2: SwitchLight Architecture. The input source image is decomposed into normal map, lighting, diffuse and specular components. Given these intrinsics, images are re-rendered under target lighting. The architecture integrates the Cook-Torrance reflection model; the final output combines physically-based predictions with neural network enhancements for realistic portrait relighting.
  • Figure 3: Render Net Overview. Utilizing extracted image intrinsics, it employs the Cook-Torrance model for initial relighting and a neural network for enhanced refinement, producing high-fidelity relit images through a synergistic computational approach.
  • Figure 4: Neural Render Enhancement. Using the Cook-Torrance model, diffuse and specular renders are computed, which are then composited into a physically-based rendering. Subsequently, a neural network enhances this PBR render, improving aspects such as brightness and specular details.
  • Figure 5: Dynamic Masking Strategies. We have generalized the MAE masks to include overlapping patches of varying sizes, as well as outpainting and free-form masks.
  • ...and 11 more figures