Dual-Branch Network for Portrait Image Quality Assessment
Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai
TL;DR
The paper addresses portrait image quality assessment (PIQA) by introducing a dual-branch network that separately models the influence of the full portrait and the facial region. It employs two Swin Transformer-B backbones pretrained on the large-scale LSVQ and GFIQA datasets, augmented with LIQE auxiliary features, and learns with a fidelity loss in a learning-to-rank framework to handle cross-scene score inconsistencies. The approach yields state-of-the-art performance on the PIQ dataset and competitive NTIRE 2024 results, validated by comprehensive ablations showing the benefits of jointly modeling face and background and of using LIQE features. The work provides a practical, scalable solution for PIQA with publicly released code, enabling more robust portrait-quality assessment in real-world applications.
Abstract
Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing devices. In this paper, we introduce a dual-branch network for portrait image quality assessment (PIQA), which can effectively address how the salient person and the background of a portrait image influence its visual quality. Specifically, we utilize two backbone networks (\textit{i.e.,} Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it. To enhance the quality-aware feature representation of the backbones, we pre-train them on the large-scale video quality assessment dataset LSVQ and the large-scale facial image quality assessment dataset GFIQA. Additionally, we leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features. Finally, we concatenate these features and regress them into quality scores via a multi-perception layer (MLP). We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ. Experimental results demonstrate that the proposed model achieves superior performance in the PIQ dataset, validating its effectiveness. The code is available at \url{https://github.com/sunwei925/DN-PIQA.git}.
