Table of Contents
Fetching ...

Surveillance Facial Image Quality Assessment: A Multi-dimensional Dataset and Lightweight Model

Yanwei Jiang, Wei Sun, Yingjie Zhou, Xiangyang Zhu, Yuqin Cao, Jun Jia, Yunhao Li, Sijing Wu, Dandan Zhu, Xingkuo Min, Guangtao Zhai

TL;DR

This work targets surveillance facial image quality assessment by addressing both perceptual quality and face fidelity in real-world conditions. It introduces SFIQA-Bench, a real-surveillance dataset of 5,004 images annotated across six quality dimensions, and shows rich MOS and inter-dimension correlations that motivate multi-dimensional evaluation. It then proposes SFIQA-Assessor, a lightweight, multi-view FIQA model with cross-view fusion and a task-aware decoder that jointly predicts six quality scores with high efficiency, outperforming many baselines on SFIQA-Bench and generalizing to FIQA datasets. The results demonstrate practical value for real-time surveillance pipelines, enabling more reliable identity verification while accounting for restoration artifacts and diverse capture conditions.

Abstract

Surveillance facial images are often captured under unconstrained conditions, resulting in severe quality degradation due to factors such as low resolution, motion blur, occlusion, and poor lighting. Although recent face restoration techniques applied to surveillance cameras can significantly enhance visual quality, they often compromise fidelity (i.e., identity-preserving features), which directly conflicts with the primary objective of surveillance images -- reliable identity verification. Existing facial image quality assessment (FIQA) predominantly focus on either visual quality or recognition-oriented evaluation, thereby failing to jointly address visual quality and fidelity, which are critical for surveillance applications. To bridge this gap, we propose the first comprehensive study on surveillance facial image quality assessment (SFIQA), targeting the unique challenges inherent to surveillance scenarios. Specifically, we first construct SFIQA-Bench, a multi-dimensional quality assessment benchmark for surveillance facial images, which consists of 5,004 surveillance facial images captured by three widely deployed surveillance cameras in real-world scenarios. A subjective experiment is conducted to collect six dimensional quality ratings, including noise, sharpness, colorfulness, contrast, fidelity and overall quality, covering the key aspects of SFIQA. Furthermore, we propose SFIQA-Assessor, a lightweight multi-task FIQA model that jointly exploits complementary facial views through cross-view feature interaction, and employs learnable task tokens to guide the unified regression of multiple quality dimensions. The experiment results on the proposed dataset show that our method achieves the best performance compared with the state-of-the-art general image quality assessment (IQA) and FIQA methods, validating its effectiveness for real-world surveillance applications.

Surveillance Facial Image Quality Assessment: A Multi-dimensional Dataset and Lightweight Model

TL;DR

This work targets surveillance facial image quality assessment by addressing both perceptual quality and face fidelity in real-world conditions. It introduces SFIQA-Bench, a real-surveillance dataset of 5,004 images annotated across six quality dimensions, and shows rich MOS and inter-dimension correlations that motivate multi-dimensional evaluation. It then proposes SFIQA-Assessor, a lightweight, multi-view FIQA model with cross-view fusion and a task-aware decoder that jointly predicts six quality scores with high efficiency, outperforming many baselines on SFIQA-Bench and generalizing to FIQA datasets. The results demonstrate practical value for real-time surveillance pipelines, enabling more reliable identity verification while accounting for restoration artifacts and diverse capture conditions.

Abstract

Surveillance facial images are often captured under unconstrained conditions, resulting in severe quality degradation due to factors such as low resolution, motion blur, occlusion, and poor lighting. Although recent face restoration techniques applied to surveillance cameras can significantly enhance visual quality, they often compromise fidelity (i.e., identity-preserving features), which directly conflicts with the primary objective of surveillance images -- reliable identity verification. Existing facial image quality assessment (FIQA) predominantly focus on either visual quality or recognition-oriented evaluation, thereby failing to jointly address visual quality and fidelity, which are critical for surveillance applications. To bridge this gap, we propose the first comprehensive study on surveillance facial image quality assessment (SFIQA), targeting the unique challenges inherent to surveillance scenarios. Specifically, we first construct SFIQA-Bench, a multi-dimensional quality assessment benchmark for surveillance facial images, which consists of 5,004 surveillance facial images captured by three widely deployed surveillance cameras in real-world scenarios. A subjective experiment is conducted to collect six dimensional quality ratings, including noise, sharpness, colorfulness, contrast, fidelity and overall quality, covering the key aspects of SFIQA. Furthermore, we propose SFIQA-Assessor, a lightweight multi-task FIQA model that jointly exploits complementary facial views through cross-view feature interaction, and employs learnable task tokens to guide the unified regression of multiple quality dimensions. The experiment results on the proposed dataset show that our method achieves the best performance compared with the state-of-the-art general image quality assessment (IQA) and FIQA methods, validating its effectiveness for real-world surveillance applications.
Paper Structure (32 sections, 14 equations, 11 figures, 6 tables)

This paper contains 32 sections, 14 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: (a) High-quality reference facial images. (b) Corresponding degraded versions generated by applying JPEG compression and blur. (c) and (d) Restored images produced by two representative face restoration methods yang2021ganyue2024difface. While the restorations enhance perceptual quality, they may introduce hallucinated textures or pseudo-structures, potentially compromising identity fidelity.
  • Figure 2: Illustration of the misalignment between recognition quality and perceptual quality. Checkmarks indicate much higher quality score than crossmarks. CLIB-FIQA scores reflect recognition quality, while SFIQA-Bench scores represent perceptual quality. Compared to CLIB-FIQA, the proposed SFIQA-Assessor produces scores more consistent with human judgments.
  • Figure 3: Representative examples of real-world surveillance facial images in SFIQA-Bench. The images cover six typical scenarios: indoor, outdoor, and in-vehicle surveillance captured during both daytime and nighttime.
  • Figure 4: Distributions of some attributes in SFIQA-Bench.
  • Figure 5: Histogram of image resolutions in SFIQA-Bench. All images are square, so only one side of the resolution is shown.
  • ...and 6 more figures