Table of Contents
Fetching ...

A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation

Dawei Zhou, Suzhi Gang, Decheng Liu, Tongliang Liu, Nannan Wang, Xinbo Gao

TL;DR

This work addresses the security risks of malicious visual manipulation and the shortcomings of data-only defenses by proposing a knowledge-guided adversarial defense (KGAD). KGAD jointly leverages domain-specific knowledge and visual-perception cues to generate adversarial noise that forces manipulation models to produce semantically confused outputs, improving protection across face- and style-manipulation tasks. The method optimizes a combined loss $L_{KGAD} = L_{pk} + \lambda L_{dk}$ with $L_{dk} = - \ell_d(\mathcal{K}_d(G_\theta(x)), \mathcal{K}_d(G_\theta(x+\delta)))$ and $L_{pk} = - \Delta_{pk}(G_\theta(x), G_\theta(x+\delta))$, using perceptual metrics like SSIMD/LPIPS and domain features such as keypoints or content. Experiments on CelebA and Monet2Photo with StarGAN, AGGAN, HiSD, CycleGAN, and AdaAttN demonstrate superior distortion, generalization, and transferability compared with state-of-the-art defenses, validating KGAD’s potential to mitigate real-world risks from deepfake and other malicious manipulations.

Abstract

Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields. To alleviate these issues, adversarial noise-based defenses have been enthusiastically studied in recent years. However, ``data-only" methods tend to distort fake samples in the low-level feature space rather than the high-level semantic space, leading to limitations in resisting malicious manipulation. Frontier research has shown that integrating knowledge in deep learning can produce reliable and generalizable solutions. Inspired by these, we propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples. Specifically, in the process of generating adversarial noise, we focus on constructing significant semantic confusions at the domain-specific knowledge level, and exploit a metric closely related to visual perception to replace the general pixel-wise metrics. The generated adversarial noise can actively interfere with the malicious manipulation model by triggering knowledge-guided and perception-related disruptions in the fake samples. To validate the effectiveness of the proposed method, we conduct qualitative and quantitative experiments on human perception and visual quality assessment. The results on two different tasks both show that our defense provides better protection compared to state-of-the-art methods and achieves great generalizability.

A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation

TL;DR

This work addresses the security risks of malicious visual manipulation and the shortcomings of data-only defenses by proposing a knowledge-guided adversarial defense (KGAD). KGAD jointly leverages domain-specific knowledge and visual-perception cues to generate adversarial noise that forces manipulation models to produce semantically confused outputs, improving protection across face- and style-manipulation tasks. The method optimizes a combined loss with and , using perceptual metrics like SSIMD/LPIPS and domain features such as keypoints or content. Experiments on CelebA and Monet2Photo with StarGAN, AGGAN, HiSD, CycleGAN, and AdaAttN demonstrate superior distortion, generalization, and transferability compared with state-of-the-art defenses, validating KGAD’s potential to mitigate real-world risks from deepfake and other malicious manipulations.

Abstract

Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields. To alleviate these issues, adversarial noise-based defenses have been enthusiastically studied in recent years. However, ``data-only" methods tend to distort fake samples in the low-level feature space rather than the high-level semantic space, leading to limitations in resisting malicious manipulation. Frontier research has shown that integrating knowledge in deep learning can produce reliable and generalizable solutions. Inspired by these, we propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples. Specifically, in the process of generating adversarial noise, we focus on constructing significant semantic confusions at the domain-specific knowledge level, and exploit a metric closely related to visual perception to replace the general pixel-wise metrics. The generated adversarial noise can actively interfere with the malicious manipulation model by triggering knowledge-guided and perception-related disruptions in the fake samples. To validate the effectiveness of the proposed method, we conduct qualitative and quantitative experiments on human perception and visual quality assessment. The results on two different tasks both show that our defense provides better protection compared to state-of-the-art methods and achieves great generalizability.

Paper Structure

This paper contains 11 sections, 4 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Distorted fake samples corrupted by adversarial noise-based defenses. The top third and fourth figures are corrupted by the general method and our proposed method, respectively. The disruptions in the former are mainly clustered in local color textures while the overall structure is still clear. Conversely, the disruptions in the latter significantly perturb the semantic information (e.g., face structure), which causes more confusion from the perspective of human vision. The face in the fake sample corrupted by the general defense precisely matches the face in the original input (see the middle figures), leading that identity privacy is still used for malicious actions, but our method mitigates this issue. Moreover, we use an intelligent model to understand the content of an image. According to the statements of the model for the three samples, it can be seen that the general defense leaves out critical information, while our method performs a more sufficient obfuscation.
  • Figure 2: The purpose of the proposed method. Unlike the detection-based passive defense (top), our method (bottom) focuses on embedding human-imperceptible noise into input samples to perturb malicious manipulation models. Moreover, different with general active defenses (middle), the adversarial noise of our method is constructed under the guidance of domain-specific and visual-perception knowledge to make the distorted fake samples have obvious anomalies and confusing semantics, so that critical information (e.g., the identity privacy) is more thoroughly obfuscated.
  • Figure 3: The limitations of general adversarial noise-based defenses. Although the distorted fake samples have anomalies compared with original fake samples, the face appearance is still clearly visible and the keypoints is normally detected. Furthermore, although maximizing $L_2$-norm can lead to a larger value on pixel-wise MSE, it does not maintain this advantage on the LPIPS indicator which is more consistent with human vision.
  • Figure 4: The schematic diagram of the proposed knowledge-guided adversarial defense method. We leverage the knowledge guidance to assist in the generation of protective adversarial noise, and then add the noise to the original input sample to interfere with the malicious manipulation model, making it produce obvious distortions and confusing semantics. The proposed method consists of two main components: a constraint based on the domain-specific knowledge and a metric based on the visual-perception knowledge. The former works on destroying important semantics associated with the specific task (e.g., the face structure for the face manipulation task) in the fake samples, and the latter focuses on disrupting visual perception-related features. Our method strives to achieve the above goals by jointly minimizing the domain-specific loss $\mathcal{L}_{dk}$ and the visual-perception loss $\mathcal{L}_{pk}$ to iteratively update the adversarial noise.
  • Figure 5: Examples of fake samples corrupted by different defense methods. The images in the first column are the original input samples, and the images in the right five columns are fake samples produced by the malicious manipulation model StarGAN. We utilize three defense methods as baselines: ITD ruiz2020disrupting, TTFID huang2021initiative and TAFIM aneja2022tafim. We can observe that the perturbations caused by the baselines are mainly clustered in the color textures, and the face contours under the abnormal textures are still relatively clear. The face semantics of fake samples corrupted by our method are significantly perturbed, and the structures of the five senses become very confusing. In addition, we conduct a questionnaire to obtain feedback on image distortion from a perspective of human vision (the ranking score ranges from 1 to 4, a higher score indicates a stronger degree of the distortion).
  • ...and 6 more figures