Table of Contents
Fetching ...

Training Neural Networks on RAW and HDR Images for Restoration Tasks

Andrew Yanzhe Ke, Lei Luo, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Alexandre Chapiro, Rafał K. Mantiuk

TL;DR

The paper tackles how to train neural networks for restoration tasks on HDR/RAW data, comparing linear versus display-encoded color spaces and various loss formulations. It conducts a broad benchmark across three restoration tasks (denoising, deblurring, and single-image super-resolution) using four representations—Linear, $ ext{μ-law}$, PQ, and PU21—and eight training combinations, on HDR and RAW datasets. The key finding is that training with display-encoded representations yields substantial gains of $2$–$9$ dB over linear-L1 baselines, though no single encoding is universally superior; perceptual encodings also improve data efficiency and reduce artifacts. The results offer practical guidelines for practitioners, showing that small, well-chosen training choices can significantly improve restoration quality on HDR/RAW content while highlighting the trade-offs between perceptual uniformity and encoding complexity.

Abstract

The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearly related to colorimetric quantities of light. While training on commonly available display-encoded images is a well-established practice, there is no consensus on how neural networks should be trained for tasks on RAW and HDR images in linear color spaces. In this work, we test several approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution. We examine whether HDR/RAW images need to be display-encoded using popular transfer functions (PQ, PU21, and mu-law), or whether it is better to train in linear color spaces, but use loss functions that correct for perceptual non-uniformity. Our results indicate that neural networks train significantly better on HDR and RAW images represented in display-encoded color spaces, which offer better perceptual uniformity than linear spaces. This small change to the training strategy can bring a very substantial gain in performance, between 2 and 9 dB.

Training Neural Networks on RAW and HDR Images for Restoration Tasks

TL;DR

The paper tackles how to train neural networks for restoration tasks on HDR/RAW data, comparing linear versus display-encoded color spaces and various loss formulations. It conducts a broad benchmark across three restoration tasks (denoising, deblurring, and single-image super-resolution) using four representations—Linear, , PQ, and PU21—and eight training combinations, on HDR and RAW datasets. The key finding is that training with display-encoded representations yields substantial gains of dB over linear-L1 baselines, though no single encoding is universally superior; perceptual encodings also improve data efficiency and reduce artifacts. The results offer practical guidelines for practitioners, showing that small, well-chosen training choices can significantly improve restoration quality on HDR/RAW content while highlighting the trade-offs between perceptual uniformity and encoding complexity.

Abstract

The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearly related to colorimetric quantities of light. While training on commonly available display-encoded images is a well-established practice, there is no consensus on how neural networks should be trained for tasks on RAW and HDR images in linear color spaces. In this work, we test several approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution. We examine whether HDR/RAW images need to be display-encoded using popular transfer functions (PQ, PU21, and mu-law), or whether it is better to train in linear color spaces, but use loss functions that correct for perceptual non-uniformity. Our results indicate that neural networks train significantly better on HDR and RAW images represented in display-encoded color spaces, which offer better perceptual uniformity than linear spaces. This small change to the training strategy can bring a very substantial gain in performance, between 2 and 9 dB.
Paper Structure (19 sections, 5 equations, 11 figures, 3 tables)

This paper contains 19 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Perceptual encoding functions used to transform linear RGB color values to an approximately perceptually uniform space. A logarithmic scale is used for the x-axis to account for Weber's law. Following BT.2100, the range of interest for encoded values is between 0.005 and 10 000. Note, however, that in our experiments, we used a smaller range between 0 and 4 000. Since $\mu$-law encoding expects input values between 0 and 1, the values were divided by 10 000 before passing to the function in Eq. \ref{['eq:mu-law']}.
  • Figure 2: Training on HDR/RAW images. Training pairs were first (optionally) encoded from a linear to a perceptually uniform color space using PQ, PU21 or a $\mu$-law transfer function (explained in Section \ref{['sec:representations']}). If no pixel encoding was applied (linear), one of the loss functions from Section \ref{['sec:loss-functions']} was used to account for the perceptual non-uniformity.
  • Figure 3: Single-image super-resolution results for the HDR (top) and RAW (bottom) datasets, the two networks (Real-ESRGAN and EDSR), and three metrics. The violin shape represents the distribution across the testing set. A thin black line shows the range of quality scores (excluding outliers), and the thicker line shows the region between the 25th and 75th percentiles. The dot and numerical value represent the median. Red horizontal lines above the violin plots denote the groups of conditions for which we have no evidence of statistically significant differences. Results are sorted from the best on the left to the worst on the right. The colors are different for each combination of representation and loss and are consistent across plots.
  • Figure 4: Deblurring results for the two networks (columns) and three metrics (rows). A few configurations are missing for GFNet as network training failed to converge for those. The notation is the same as in Figure \ref{['fig:sisr-results']}.
  • Figure 5: Denoising results for the two networks (columns) and three metrics (rows) The notation is the same as in Figure \ref{['fig:sisr-results']}.
  • ...and 6 more figures