SAIL: Self-supervised Albedo Estimation from Real Images with a Latent Diffusion Model
Hala Djeghim, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry Tsishkou, Céline Loscos, Désiré Sidibé
TL;DR
SAIL tackles intrinsic image decomposition for real-world images by producing albedo-like representations via a latent-diffusion prior, trained with unlabeled multi-illumination data. It represents each latent image as $z_i = z^A + z_i^E$, where $z^A$ is the albedo latent decoded to $A$ and $z_i^E$ encodes lighting, and optimizes an unconditioned relighting objective along with latent-space regularizers. The framework operates entirely in the latent space with a diffusion-based decoder, enabling robust albedo estimation and enabling downstream relighting and appearance editing without labeled data. Empirical results on MIDIntrinsics and in-the-wild datasets show improved albedo consistency and competitive performance relative to both supervised and self-supervised baselines, demonstrating practical utility for real-world intrinsic decomposition and relighting tasks.
Abstract
Intrinsic image decomposition aims at separating an image into its underlying albedo and shading components, isolating the base color from lighting effects to enable downstream applications such as virtual relighting and scene editing. Despite the rise and success of learning-based approaches, intrinsic image decomposition from real-world images remains a significant challenging task due to the scarcity of labeled ground-truth data. Most existing solutions rely on synthetic data as supervised setups, limiting their ability to generalize to real-world scenes. Self-supervised methods, on the other hand, often produce albedo maps that contain reflections and lack consistency under different lighting conditions. To address this, we propose SAIL, an approach designed to estimate albedo-like representations from single-view real-world images. We repurpose the prior knowledge of a latent diffusion model for unconditioned scene relighting as a surrogate objective for albedo estimation. To extract the albedo, we introduce a novel intrinsic image decomposition fully formulated in the latent space. To guide the training of our latent diffusion model, we introduce regularization terms that constrain both the lighting-dependent and independent components of our latent image decomposition. SAIL predicts stable albedo under varying lighting conditions and generalizes to multiple scenes, using only unlabeled multi-illumination data available online.
