Table of Contents
Fetching ...

Revisiting Image Fusion for Multi-Illuminant White-Balance Correction

David Serrano-Lozano, Aditya Arora, Luis Herranz, Konstantinos G. Derpanis, Michael S. Brown, Javier Vazquez-Corral

TL;DR

This work addresses the challenge of white-balance correction in scenes with multiple illuminants by moving beyond linear fusion of predefined WB presets. It introduces an efficient transformer-based fusion mechanism that blends five sRGB WB presets in an end-to-end manner, capturing non-linear spatial interactions across presets. A new large-scale multi-illuminant sRGB dataset is also presented, comprising 16,284 images with ground-truth WB, enabling robust training and evaluation. Empirically, the proposed method outperforms prior fusion-based and single-illuminant WB methods across multi-illuminant, cross-camera, and cross-dataset scenarios, while offering improved efficiency. The work also provides a valuable dataset to spur further advances in multi-illuminant WB research.

Abstract

White balance (WB) correction in scenes with multiple illuminants remains a persistent challenge in computer vision. Recent methods explored fusion-based approaches, where a neural network linearly blends multiple sRGB versions of an input image, each processed with predefined WB presets. However, we demonstrate that these methods are suboptimal for common multi-illuminant scenarios. Additionally, existing fusion-based methods rely on sRGB WB datasets lacking dedicated multi-illuminant images, limiting both training and evaluation. To address these challenges, we introduce two key contributions. First, we propose an efficient transformer-based model that effectively captures spatial dependencies across sRGB WB presets, substantially improving upon linear fusion techniques. Second, we introduce a large-scale multi-illuminant dataset comprising over 16,000 sRGB images rendered with five different WB settings, along with WB-corrected images. Our method achieves up to 100\% improvement over existing techniques on our new multi-illuminant image fusion dataset.

Revisiting Image Fusion for Multi-Illuminant White-Balance Correction

TL;DR

This work addresses the challenge of white-balance correction in scenes with multiple illuminants by moving beyond linear fusion of predefined WB presets. It introduces an efficient transformer-based fusion mechanism that blends five sRGB WB presets in an end-to-end manner, capturing non-linear spatial interactions across presets. A new large-scale multi-illuminant sRGB dataset is also presented, comprising 16,284 images with ground-truth WB, enabling robust training and evaluation. Empirically, the proposed method outperforms prior fusion-based and single-illuminant WB methods across multi-illuminant, cross-camera, and cross-dataset scenarios, while offering improved efficiency. The work also provides a valuable dataset to spur further advances in multi-illuminant WB research.

Abstract

White balance (WB) correction in scenes with multiple illuminants remains a persistent challenge in computer vision. Recent methods explored fusion-based approaches, where a neural network linearly blends multiple sRGB versions of an input image, each processed with predefined WB presets. However, we demonstrate that these methods are suboptimal for common multi-illuminant scenarios. Additionally, existing fusion-based methods rely on sRGB WB datasets lacking dedicated multi-illuminant images, limiting both training and evaluation. To address these challenges, we introduce two key contributions. First, we propose an efficient transformer-based model that effectively captures spatial dependencies across sRGB WB presets, substantially improving upon linear fusion techniques. Second, we introduce a large-scale multi-illuminant dataset comprising over 16,000 sRGB images rendered with five different WB settings, along with WB-corrected images. Our method achieves up to 100\% improvement over existing techniques on our new multi-illuminant image fusion dataset.

Paper Structure

This paper contains 21 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Example from the RenderedWB afifi2019color dataset. (a)-(e) show the same scene processed with five distinct WB presets, while (f) presents the white-balanced image obtained using the color checker. Three sample points, marked with teal, yellow, and purple dots for the WB presets and crosses for the ground truth image, are selected across all images. (g) visualizes the pixel values in the sRGB space, along with the polytope formed by the WB presets. In here, we also show the results of our method as stars. Note that each axis has a different scale to ease the visualization.
  • Figure 2: Procedure for generating a ground truth (GT) sRGB image for scenes with multiple illuminants. We begin with the dataset from Kim et al. kim2021large, focused on unprocessed RAW images. All scenes are initially captured under a single illuminant, and additional illuminants are introduced individually. (a) We first compute the WB-corrected image for the single-illuminant scene using the Macbeth color checker. (b) Next, we apply AWB to the multiple-illuminant images. (c) Finally, we adjust the per-pixel brightness of the image obtained in Step 1 to generate the ground truth image. We do so by making the pixel brightness of this image match the pixel brightness from the image obtained in Step 2.
  • Figure 3: Samples from our dataset, showing the same scene under varying lighting conditions and WB presets. Each column presents an image rendered with a specific WB setting alongside the ground truth. All images include outdoor lighting, while the first two rows feature two different types of indoor light, and the last row includes all three light sources (outdoor and both indoor types).
  • Figure 4: Quantitative results on our dataset and the synthetic test set afifi2022. Top to bottom: an image of the Sony split, two Nikon images, and the synthetic dataset. Left to right: the results of the WB preset with lowest $\Delta$E2000 (a), DeepWB afifi2020deep (b), MixedWB afifi2022 (c), StyleWB kinli2023modeling (d), our transformer-based method (e) and the ground truth (f). In the first column (a), we show the name of the WB preset in the bottom-left corner. The $\Delta$E2000 value for each image for each image is shown in the bottom-right corner, lower values indicate higher performance.