Table of Contents
Fetching ...

Colorful Diffuse Intrinsic Image Decomposition in the Wild

Chris Careaga, Yağız Aksoy

TL;DR

The paper tackles intrinsic image decomposition under colorful, real-world illumination by extending the classic grayscale Lambertian model to include a colorful shading component and a residual non-diffuse term. It introduces a multi-stage pipeline that first estimates shading chroma, then sparse albedo, and finally diffuse shading plus a residual layer, enabling RGB shading and specularity separation in high resolution for in-the-wild images. Quantitative results on MAW show state-of-the-art albedo accuracy in both intensity and chromaticity, while ARAP demonstrates strong generalization to out-of-distribution scenes; qualitative analysis and ablations validate the benefits of the staged approach over a single large model. The method enables practical illumination-aware editing, including specularity removal and per-pixel white balancing, and lays groundwork for more realistic inverse rendering in diverse real-world imagery.

Abstract

Intrinsic image decomposition aims to separate the surface reflectance and the effects from the illumination given a single photograph. Due to the complexity of the problem, most prior works assume a single-color illumination and a Lambertian world, which limits their use in illumination-aware image editing applications. In this work, we separate an input image into its diffuse albedo, colorful diffuse shading, and specular residual components. We arrive at our result by gradually removing first the single-color illumination and then the Lambertian-world assumptions. We show that by dividing the problem into easier sub-problems, in-the-wild colorful diffuse shading estimation can be achieved despite the limited ground-truth datasets. Our extended intrinsic model enables illumination-aware analysis of photographs and can be used for image editing applications such as specularity removal and per-pixel white balancing.

Colorful Diffuse Intrinsic Image Decomposition in the Wild

TL;DR

The paper tackles intrinsic image decomposition under colorful, real-world illumination by extending the classic grayscale Lambertian model to include a colorful shading component and a residual non-diffuse term. It introduces a multi-stage pipeline that first estimates shading chroma, then sparse albedo, and finally diffuse shading plus a residual layer, enabling RGB shading and specularity separation in high resolution for in-the-wild images. Quantitative results on MAW show state-of-the-art albedo accuracy in both intensity and chromaticity, while ARAP demonstrates strong generalization to out-of-distribution scenes; qualitative analysis and ablations validate the benefits of the staged approach over a single large model. The method enables practical illumination-aware editing, including specularity removal and per-pixel white balancing, and lays groundwork for more realistic inverse rendering in diverse real-world imagery.

Abstract

Intrinsic image decomposition aims to separate the surface reflectance and the effects from the illumination given a single photograph. Due to the complexity of the problem, most prior works assume a single-color illumination and a Lambertian world, which limits their use in illumination-aware image editing applications. In this work, we separate an input image into its diffuse albedo, colorful diffuse shading, and specular residual components. We arrive at our result by gradually removing first the single-color illumination and then the Lambertian-world assumptions. We show that by dividing the problem into easier sub-problems, in-the-wild colorful diffuse shading estimation can be achieved despite the limited ground-truth datasets. Our extended intrinsic model enables illumination-aware analysis of photographs and can be used for image editing applications such as specularity removal and per-pixel white balancing.
Paper Structure (26 sections, 7 equations, 11 figures, 5 tables)

This paper contains 26 sections, 7 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: In this work, we extend the in-the-wild intrinsic decomposition formulations to include a colorful shading component as well as a non-diffuse residual component. This extended image formation enables illumination-aware image editing applications, such as specularity removal as shown at the top, and per-pixel white balancing. Images from Unsplash by NorWood Themes (pots), mos design (street) and Josh Carter (mountain).
  • Figure 2: Our pipeline starts with an input image and a shading/albedo pair generated within the simplified grayscale intrinsic diffuse model generated via an off-the-shelf method. We first extend the image formation model to include colorful shading, and estimate the shading color using our chroma network. This color information is used as input in the second step where we estimate the high-resolution diffuse albedo. In the final step, we remove the Lambertian-world assumption and estimate a colorful diffuse shading component and a non-diffuse residual layer. A single variable is estimated at each step, $S_g$, $C$, $A_d$, and $S_d$, respectively, and other intrinsic components are computed using the corresponding intrinsic image formation model with increasing representative power. Image from Unsplash by Nathan Van Egmond.
  • Figure 3: The initial albedo map that we use as input contains significant color shifts due to the grayscale shading assumption. Using the shading chroma estimated by our first network (Sec. \ref{['sec:method:chroma']}), these color shifts are corrected but it fails to remove fine details coming from complex illumination. Our albedo estimation network (Sec. \ref{['sec:method:albedo']}) is able to remove the effects of the illumination and estimate a sparse albedo map. Image from Unsplash by Holly Stratton.
  • Figure 4: Starting from a grayscale shading estimation, we first estimate the shading chroma (Sec. \ref{['sec:method:chroma']}) and create a colorized shading map. In the final step of our pipeline (Sec. \ref{['sec:method:shading']}), we further separate the illumination into diffuse shading and non-diffuse residual components. The positive part of the residual represents the specularities in the scene, while the negative part shows the over-exposed regions. Image from Unsplash by Jiwoo Park.
  • Figure 5: Given the large number of recently proposed diffusion-based methods, we provide a focused qualitative evaluation against these models. These examples show some of the shortcomings of utilizing generative modeling to address the problem of intrinsic decomposition. Since these methods learn a mapping in the latent space of large pre-trained generative models, their outputs can have unintended side-effects like warped faces, and illegible text. These alterations can have a negative impact on downstream editing applications. Additionally, although these methods can achieve the high-level appearance of albedo, they are highly dependent on their training data distribution which can cause effects such as large color shifts and baked-in shading. Images from Unsplash (from top to bottom) by Mert Kahveci, Joel Muniz, Dollar Gill, and Annie Spratt
  • ...and 6 more figures