UniLight: A Unified Representation for Lighting
Zitian Zhang, Iliyan Georgiev, Michael Fischer, Yannick Hold-Geoffroy, Jean-François Lalonde, Valentin Deschaintre
TL;DR
<3-5 sentence high-level summary> UniLight introduces a unified latent lighting representation that bridges text, images, irradiance, and environment maps through modality-specific encoders trained with a cross-modal contrastive objective and an auxiliary spherical-harmonics loss. The approach relies on a compact fusion module to produce a shared embedding, enabling cross-modal retrieval, environment-map generation, and light-controlled diffusion-based synthesis. A multi-modal dataset with aligned modalities supports robust training and evaluation across lighting tasks. The results demonstrate transferable, directional lighting understanding and practical control for lighting-aware image synthesis and editing.
Abstract
Lighting has a strong influence on visual appearance, yet understanding and representing lighting in images remains notoriously difficult. Various lighting representations exist, such as environment maps, irradiance, spherical harmonics, or text, but they are incompatible, which limits cross-modal transfer. We thus propose UniLight, a joint latent space as lighting representation, that unifies multiple modalities within a shared embedding. Modality-specific encoders for text, images, irradiance, and environment maps are trained contrastively to align their representations, with an auxiliary spherical-harmonics prediction task reinforcing directional understanding. Our multi-modal data pipeline enables large-scale training and evaluation across three tasks: lighting-based retrieval, environment-map generation, and lighting control in diffusion-based image synthesis. Experiments show that our representation captures consistent and transferable lighting features, enabling flexible manipulation across modalities.
