Table of Contents
Fetching ...

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Yujia Liu, Chenxi Yang, Dingquan Li, Jianhao Ding, Tingting Jiang

TL;DR

This work presents theoretical evidence showing that the magnitude of score changes is related to the g 1 norm of the model's gradient with respect to the input image, and proposes a norm regularization training strategy aimed at boosting the robustness of NR-IQA models.

Abstract

The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the $\ell_1$ norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the $\ell_1$ norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area.

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

TL;DR

This work presents theoretical evidence showing that the magnitude of score changes is related to the g 1 norm of the model's gradient with respect to the input image, and proposes a norm regularization training strategy aimed at boosting the robustness of NR-IQA models.

Abstract

The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area.
Paper Structure (32 sections, 1 theorem, 23 equations, 15 figures, 11 tables)

This paper contains 32 sections, 1 theorem, 23 equations, 15 figures, 11 tables.

Key Result

Theorem 1

Suppose $f$ represents an NR-IQA model, $\epsilon$ is the strength of an attack, and $x$ denotes an input image. The maximum change in predicted scores of $x$ by $f$ against $\ell_\infty$-bounded attacks is highly correlated to $\Vert \nabla_x f(x) \Vert_1$, which can be formulated as

Figures (15)

  • Figure 1: Comparison of DBCNN 2020_TCSVT_DBCNN trained with and without the proposed Norm regularization Training (NT) strategy under the Perceptual Attack 2022_NIPS_Zhang_PAttack using the same setting. The absolute differences between predicted scores before and after the attack ($\vert s_\text{after}-s_\text{before}\vert$) for all test images are presented, with the fitted distribution displayed on the right side of the picture. An example is shown with predicted scores before and after the attack (zoom in for a better view). It is evident that DBCNN+NT exhibits smaller score changes compared to the baseline model.
  • Figure 2: (Zoom in for a better view) Examples of adversarial attacks on the DBCNN 2020_TCSVT_DBCNN model. The range of MOS is $[0,100]$.
  • Figure 3: Three dimensions in experimental settings.
  • Figure 4: The comparison of $\ell_1$ norm distribution of gradient between baseline models (blue) and baseline+NT models (orange).
  • Figure 5: The relationship between the gradient norm and the robustness in terms of RMSE (left) and SROCC (right). The horizontal axis represents the logarithm of the average $\Vert \nabla_x f(x)\Vert_1$ value across all test images. All metrics are calculated between predicted scores before and after the UAP attack.
  • ...and 10 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof