Table of Contents
Fetching ...

Adversarial Watermarking for Face Recognition

Yuguang Yao, Anil Jain, Sijia Liu

TL;DR

This study introduces a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially, but once watermarking is applied, the attack is activated, causing recognition failures.

Abstract

Watermarking is an essential technique for embedding an identifier (i.e., watermark message) within digital images to assert ownership and monitor unauthorized alterations. In face recognition systems, watermarking plays a pivotal role in ensuring data integrity and security. However, an adversary could potentially interfere with the watermarking process, significantly impairing recognition performance. We explore the interaction between watermarking and adversarial attacks on face recognition models. Our findings reveal that while watermarking or input-level perturbation alone may have a negligible effect on recognition accuracy, the combined effect of watermarking and perturbation can result in an adversarial watermarking attack, significantly degrading recognition performance. Specifically, we introduce a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially. However, once watermarking is applied, the attack is activated, causing recognition failures. Our study reveals a previously unrecognized vulnerability: adversarial perturbations can exploit the watermark message to evade face recognition systems. Evaluated on the CASIA-WebFace dataset, our proposed adversarial watermarking attack reduces face matching accuracy by 67.2% with an $\ell_\infty$ norm-measured perturbation strength of ${2}/{255}$ and by 95.9% with a strength of ${4}/{255}$.

Adversarial Watermarking for Face Recognition

TL;DR

This study introduces a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially, but once watermarking is applied, the attack is activated, causing recognition failures.

Abstract

Watermarking is an essential technique for embedding an identifier (i.e., watermark message) within digital images to assert ownership and monitor unauthorized alterations. In face recognition systems, watermarking plays a pivotal role in ensuring data integrity and security. However, an adversary could potentially interfere with the watermarking process, significantly impairing recognition performance. We explore the interaction between watermarking and adversarial attacks on face recognition models. Our findings reveal that while watermarking or input-level perturbation alone may have a negligible effect on recognition accuracy, the combined effect of watermarking and perturbation can result in an adversarial watermarking attack, significantly degrading recognition performance. Specifically, we introduce a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially. However, once watermarking is applied, the attack is activated, causing recognition failures. Our study reveals a previously unrecognized vulnerability: adversarial perturbations can exploit the watermark message to evade face recognition systems. Evaluated on the CASIA-WebFace dataset, our proposed adversarial watermarking attack reduces face matching accuracy by 67.2% with an norm-measured perturbation strength of and by 95.9% with a strength of .
Paper Structure (5 sections, 4 equations, 3 figures, 2 tables)

This paper contains 5 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the Adversarial Watermarking Attack on Face Recognition. The green path (A) represents the standard watermarking and face recognition process, where the probe face is watermarked using the watermark encoder and correctly matched with the reference face after feature extraction. The yellow path (B) shows input-level adversarial perturbations applied to evade the face recognition system without watermarking. Subtle adversarial perturbations are added to the probe face, but they do not affect the recognition result without watermarking. The red path (C) demonstrates the adversarial watermarking process, where the adversarially perturbed face image, after being watermarked, fails to match the reference face.
  • Figure 2: Violin plots of similarity scores in \ref{['eq: similarity']} at different $\epsilon$ values (scaled by ${1}/{255}$). For each $\epsilon$, the violin plot shows the distribution of similarity scores between perturbed probe and reference images under two conditions: with watermarking (blue) and without watermarking (red). By $\|\delta_\infty\| \leq \epsilon$, we change $\epsilon$ to control the perturbation strength.
  • Figure 3: Visualization of reference, probe, and perturbed/watermarked face images along with perturbation/watermark for four identities. (a) Reference face. (b) Probe face. (c) Watermarked face. (d) Difference between (b) and (c). (e) Perturbed face. (f) Difference between (b) and (e). (g) Adversarial watermarked face by watermarking perturbed face. (h) Difference between (b) and (g). All element-wise absolute differences are scaled by $\times 10$ and color reverted. All probe faces are marked with their similarity score compared with reference faces at the top of images.