Table of Contents
Fetching ...

Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays

Andrei Chubarau, Hyunjin Yoo, Tara Akhavan, James Clark

TL;DR

The paper tackles HDR IQA by re-targeting SDR-trained networks to HDR content using perceptually uniform PU encodings and a domain adaptation framework. The authors propose a training recipe that pre-trains on SDR data, fine-tunes on PU-encoded HDR data, and optionally applies CORAL-based domain adaptation to align SDR and HDR representations, enabling effective transfer with limited HDR data. Empirical results on SDR/HDR benchmarks show faster convergence and improved HDR generalization, with notable gains from using CORAL and synthetic HDR-like data. This approach provides a practical path to leveraging abundant SDR data for HDR-IQA and suggests applicability to other HDR vision tasks. The main contributions include a detailed PU encoding normalization study, a CORAL-based DA strategy for HDR transfer, and demonstrated performance gains on UPIQ and related datasets.

Abstract

Conventional image quality metrics (IQMs), such as PSNR and SSIM, are designed for perceptually uniform gamma-encoded pixel values and cannot be directly applied to perceptually non-uniform linear high-dynamic-range (HDR) colors. Similarly, most of the available datasets consist of standard-dynamic-range (SDR) images collected in standard and possibly uncontrolled viewing conditions. Popular pre-trained neural networks are likewise intended for SDR inputs, restricting their direct application to HDR content. On the other hand, training HDR models from scratch is challenging due to limited available HDR data. In this work, we explore more effective approaches for training deep learning-based models for image quality assessment (IQA) on HDR data. We leverage networks pre-trained on SDR data (source domain) and re-target these models to HDR (target domain) with additional fine-tuning and domain adaptation. We validate our methods on the available HDR IQA datasets, demonstrating that models trained with our combined recipe outperform previous baselines, converge much quicker, and reliably generalize to HDR inputs.

Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays

TL;DR

The paper tackles HDR IQA by re-targeting SDR-trained networks to HDR content using perceptually uniform PU encodings and a domain adaptation framework. The authors propose a training recipe that pre-trains on SDR data, fine-tunes on PU-encoded HDR data, and optionally applies CORAL-based domain adaptation to align SDR and HDR representations, enabling effective transfer with limited HDR data. Empirical results on SDR/HDR benchmarks show faster convergence and improved HDR generalization, with notable gains from using CORAL and synthetic HDR-like data. This approach provides a practical path to leveraging abundant SDR data for HDR-IQA and suggests applicability to other HDR vision tasks. The main contributions include a detailed PU encoding normalization study, a CORAL-based DA strategy for HDR transfer, and demonstrated performance gains on UPIQ and related datasets.

Abstract

Conventional image quality metrics (IQMs), such as PSNR and SSIM, are designed for perceptually uniform gamma-encoded pixel values and cannot be directly applied to perceptually non-uniform linear high-dynamic-range (HDR) colors. Similarly, most of the available datasets consist of standard-dynamic-range (SDR) images collected in standard and possibly uncontrolled viewing conditions. Popular pre-trained neural networks are likewise intended for SDR inputs, restricting their direct application to HDR content. On the other hand, training HDR models from scratch is challenging due to limited available HDR data. In this work, we explore more effective approaches for training deep learning-based models for image quality assessment (IQA) on HDR data. We leverage networks pre-trained on SDR data (source domain) and re-target these models to HDR (target domain) with additional fine-tuning and domain adaptation. We validate our methods on the available HDR IQA datasets, demonstrating that models trained with our combined recipe outperform previous baselines, converge much quicker, and reliably generalize to HDR inputs.
Paper Structure (15 sections, 4 equations, 3 figures, 3 tables)

This paper contains 15 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Perceptually uniform encoding with PU21 pu21 (banding + glare variant) and PQ pq (scaled by 255) contrasted with the approximate mapping between luminance and the sRGB non-linearity for typical CRT and LCD displays (simulated with \ref{['eqn:displayModel']}). Left: the full range of encoded luminance. Right: the mapping between sRGB values and PU units.
  • Figure 2: Diagram of PieAPP quality metric pieapp. For each $64 \times 64$ patch, deep feature activations from 5 convolutional layers are computed and concatenated; a fully-connected layer predicts the quality score given the difference between the reference and the distorted patch features. Features from the last layer are used to predict patch weights. The final image quality is computed as a weighted sum of patch-wise scores. Adapted from pieapp.
  • Figure 3: Diagram of VTAMIQ vtamiq. Patches from the reference and the distorted images are encoded by Vision Transformer (ViT) vit2020, the corresponding CLS token difference is computed and calibrated by a series of residual groups (RGs) based on channel attention (CA) modules. A fully-connected layer (MLP) predicts the final quality score. Adapted from vtamiq.