Training Neural Networks on RAW and HDR Images for Restoration Tasks
Andrew Yanzhe Ke, Lei Luo, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Alexandre Chapiro, Rafał K. Mantiuk
TL;DR
The paper tackles how to train neural networks for restoration tasks on HDR/RAW data, comparing linear versus display-encoded color spaces and various loss formulations. It conducts a broad benchmark across three restoration tasks (denoising, deblurring, and single-image super-resolution) using four representations—Linear, $ ext{μ-law}$, PQ, and PU21—and eight training combinations, on HDR and RAW datasets. The key finding is that training with display-encoded representations yields substantial gains of $2$–$9$ dB over linear-L1 baselines, though no single encoding is universally superior; perceptual encodings also improve data efficiency and reduce artifacts. The results offer practical guidelines for practitioners, showing that small, well-chosen training choices can significantly improve restoration quality on HDR/RAW content while highlighting the trade-offs between perceptual uniformity and encoding complexity.
Abstract
The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearly related to colorimetric quantities of light. While training on commonly available display-encoded images is a well-established practice, there is no consensus on how neural networks should be trained for tasks on RAW and HDR images in linear color spaces. In this work, we test several approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution. We examine whether HDR/RAW images need to be display-encoded using popular transfer functions (PQ, PU21, and mu-law), or whether it is better to train in linear color spaces, but use loss functions that correct for perceptual non-uniformity. Our results indicate that neural networks train significantly better on HDR and RAW images represented in display-encoded color spaces, which offer better perceptual uniformity than linear spaces. This small change to the training strategy can bring a very substantial gain in performance, between 2 and 9 dB.
