Table of Contents
Fetching ...

Light Up Your Face: A Physically Consistent Dataset and Diffusion Model for Face Fill-Light Enhancement

Jue Gong, Zihan Zhou, Jingkai Wang, Xiaohong Liu, Yulun Zhang, Xiaokang Yang

TL;DR

This work tackles face fill-light enhancement by adding a virtual fill light that preserves the original background. It introduces LYF-160K, a large-scale, physically consistent dataset created with a disk-shaped 6D fill-light parameterization, and a two-stage framework comprising PALP for physics-informed conditioning and FiLitDiff, a one-step diffusion model, for fast, controllable FFE. The approach demonstrates strong perceptual quality and competitive full-reference metrics while better preserving background illumination, outperforming several baselines and ablations. This dataset and framework provide a practical, physically grounded path for controllable portrait illumination editing with potential impact on photography and video applications.

Abstract

Face fill-light enhancement (FFE) brightens underexposed faces by adding virtual fill light while keeping the original scene illumination and background unchanged. Most face relighting methods aim to reshape overall lighting, which can suppress the input illumination or modify the entire scene, leading to foreground-background inconsistency and mismatching practical FFE needs. To support scalable learning, we introduce LightYourFace-160K (LYF-160K), a large-scale paired dataset built with a physically consistent renderer that injects a disk-shaped area fill light controlled by six disentangled factors, producing 160K before-and-after pairs. We first pretrain a physics-aware lighting prompt (PALP) that embeds the 6D parameters into conditioning tokens, using an auxiliary planar-light reconstruction objective. Building on a pretrained diffusion backbone, we then train a fill-light diffusion (FiLitDiff), an efficient one-step model conditioned on physically grounded lighting codes, enabling controllable and high-fidelity fill lighting at low computational cost. Experiments on held-out paired sets demonstrate strong perceptual quality and competitive full-reference metrics, while better preserving background illumination. The dataset and model will be at https://github.com/gobunu/Light-Up-Your-Face.

Light Up Your Face: A Physically Consistent Dataset and Diffusion Model for Face Fill-Light Enhancement

TL;DR

This work tackles face fill-light enhancement by adding a virtual fill light that preserves the original background. It introduces LYF-160K, a large-scale, physically consistent dataset created with a disk-shaped 6D fill-light parameterization, and a two-stage framework comprising PALP for physics-informed conditioning and FiLitDiff, a one-step diffusion model, for fast, controllable FFE. The approach demonstrates strong perceptual quality and competitive full-reference metrics while better preserving background illumination, outperforming several baselines and ablations. This dataset and framework provide a practical, physically grounded path for controllable portrait illumination editing with potential impact on photography and video applications.

Abstract

Face fill-light enhancement (FFE) brightens underexposed faces by adding virtual fill light while keeping the original scene illumination and background unchanged. Most face relighting methods aim to reshape overall lighting, which can suppress the input illumination or modify the entire scene, leading to foreground-background inconsistency and mismatching practical FFE needs. To support scalable learning, we introduce LightYourFace-160K (LYF-160K), a large-scale paired dataset built with a physically consistent renderer that injects a disk-shaped area fill light controlled by six disentangled factors, producing 160K before-and-after pairs. We first pretrain a physics-aware lighting prompt (PALP) that embeds the 6D parameters into conditioning tokens, using an auxiliary planar-light reconstruction objective. Building on a pretrained diffusion backbone, we then train a fill-light diffusion (FiLitDiff), an efficient one-step model conditioned on physically grounded lighting codes, enabling controllable and high-fidelity fill lighting at low computational cost. Experiments on held-out paired sets demonstrate strong perceptual quality and competitive full-reference metrics, while better preserving background illumination. The dataset and model will be at https://github.com/gobunu/Light-Up-Your-Face.
Paper Structure (13 sections, 15 equations, 7 figures, 4 tables)

This paper contains 13 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Controlling fill-light position and color temperature. Given the same input portrait, we move the virtual light along a circular trajectory while linearly increasing the color temperature, and keep the beam shape fixed. Each result is annotated with $(\Delta x,\Delta y,T)$, where $(\Delta x,\Delta y)$ denotes the lamp offset in pixels and $T$ is the corresponding color temperature in Kelvin.
  • Figure 2: Visualization of PALP predictions and effect on FFE. For two example inputs, we show the predicted planar irradiance map and direction field, together with the resulting FFE output generated under the corresponding lighting code.
  • Figure 3: 6D control of our disk-shaped area fill light, including color temperature $T$, half-peak angle $\theta_{\mathrm{hp}}$, light-to-subject distance $Z_0$, disk diameter $\mathrm{D}_{\mathrm{lamp}}$, and image-plane offset $(\Delta x,\Delta y)$.
  • Figure 4: Overview of our physically consistent dataset pipeline and fill-light renderer. Given an input portrait $I_{\mathrm{orig}}$, we estimate depth/normal, intrinsic albedo/specular, and a face mask, then render a disk-shaped area fill light controlled by 6D parameters to produce an additive residual $\Delta I_{\mathrm{lamp}}$, yielding paired images before-after fill lighting.
  • Figure 5: Overview of our framework. PALP encodes the 6D fill-light parameters into diffusion-compatible conditioning tokens via FiLM modulation and a shallow Transformer, and is pretrained with an auxiliary planar-light reconstruction decoder. The resulting lighting prompt conditions FiLitDiff, a one-step diffusion model fine-tuned from Stable Diffusion to perform controllable FFE.
  • ...and 2 more figures