Table of Contents
Fetching ...

Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey

Nicolas Chahine, Marcos V. Conde, Daniela Carfora, Gabriel Pacianotto, Benoit Pochon, Sira Ferradans, Radu Timofte

TL;DR

Portrait Quality Assessment (PQA) addresses ranking perceptual quality of portraits under diverse conditions. The NTIRE 2024 survey reviews the Portrait IQA Challenge and top-performing methods—RQ-Net, BDVQA, PQE, MoNet, and SECE-SYSU—highlighting architectures that combine global/local reasoning, ranking objectives, mean-opinion aggregation, and scene-adaptive fusion to improve cross-scene generalization. Results reveal a persistent generalization gap when test data come from new device domains, underscoring the need for robust, scene-aware representations and diverse pre-training. Overall, the work advances state-of-the-art in portrait quality estimation by detailing diverse transformer-based and gating strategies that better capture portrait semantics and scene context for practical portrait QA systems.

Abstract

This paper reviews the NTIRE 2024 Portrait Quality Assessment Challenge, highlighting the proposed solutions and results. This challenge aims to obtain an efficient deep neural network capable of estimating the perceptual quality of real portrait photos. The methods must generalize to diverse scenes and diverse lighting conditions (indoor, outdoor, low-light), movement, blur, and other challenging conditions. In the challenge, 140 participants registered, and 35 submitted results during the challenge period. The performance of the top 5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in Portrait Quality Assessment.

Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey

TL;DR

Portrait Quality Assessment (PQA) addresses ranking perceptual quality of portraits under diverse conditions. The NTIRE 2024 survey reviews the Portrait IQA Challenge and top-performing methods—RQ-Net, BDVQA, PQE, MoNet, and SECE-SYSU—highlighting architectures that combine global/local reasoning, ranking objectives, mean-opinion aggregation, and scene-adaptive fusion to improve cross-scene generalization. Results reveal a persistent generalization gap when test data come from new device domains, underscoring the need for robust, scene-aware representations and diverse pre-training. Overall, the work advances state-of-the-art in portrait quality estimation by detailing diverse transformer-based and gating strategies that better capture portrait semantics and scene context for practical portrait QA systems.

Abstract

This paper reviews the NTIRE 2024 Portrait Quality Assessment Challenge, highlighting the proposed solutions and results. This challenge aims to obtain an efficient deep neural network capable of estimating the perceptual quality of real portrait photos. The methods must generalize to diverse scenes and diverse lighting conditions (indoor, outdoor, low-light), movement, blur, and other challenging conditions. In the challenge, 140 participants registered, and 35 submitted results during the challenge period. The performance of the top 5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in Portrait Quality Assessment.
Paper Structure (18 sections, 6 equations, 9 figures, 3 tables)

This paper contains 18 sections, 6 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Sample portraits from the NTIRE 2024 Portrait Quality Assessment Challenge testing set.
  • Figure 2: Examples from the train/test split of PIQ23 Chahine_2023_CVPR. The test set incorporates various framing settings, backgrounds, subject characteristics, and weather conditions that are significantly distinct from the training set.
  • Figure 3: Sample images from two scenes of the challenge generalization test set. The three first image columns were taken with different smartphone devices, while the last column of images was taken with a DSLR camera and edited by a professional photographer.
  • Figure 4: Diagram of FULL-HyperIQA (FHIQA). The figure illustrates how FHIQA processes input images, extracts semantic information, and adapts the quality prediction based on scene-specific evaluations.
  • Figure 5: Diagram of the RQ-Net proposed by Team Xidian IPPL.
  • ...and 4 more figures