LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Yuxin Cao; Ziyu Zhao; Xi Xiao; Derui Wang; Minhui Xue; Jin Lu

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Yuxin Cao, Ziyu Zhao, Xi Xiao, Derui Wang, Minhui Xue, Jin Lu

TL;DR

LogoStyleFool tackles vulnerabilities in video recognition by introducing a subregional, logo-based style-transfer adversarial attack in a black-box setting. The method decouples style-referenced perturbations (via style-reference selection and RL-based logo style transfer) from post-RL perturbation optimization (LogoS-DCT) to achieve targeted and untargeted attacks while preserving naturalness. It provides theoretical upper bounds for partial perturbations under both $\ell_\infty$ and $\ell_2$ norms and demonstrates superior attack performance against patch-based defenses on UCF-101 and HMDB-51 with C3D and I3D, compared to PatchAttack, BSC, and Adv-watermark. The work highlights new security concerns and motivates defenses against subregional, style-based perturbations in real-world video systems.

Abstract

Video recognition systems are vulnerable to adversarial examples. Recent studies show that style transfer-based and patch-based unrestricted perturbations can effectively improve attack efficiency. These attacks, however, face two main challenges: 1) Adding large stylized perturbations to all pixels reduces the naturalness of the video and such perturbations can be easily detected. 2) Patch-based video attacks are not extensible to targeted attacks due to the limited search space of reinforcement learning that has been widely used in video attacks recently. In this paper, we focus on the video black-box setting and propose a novel attack framework named LogoStyleFool by adding a stylized logo to the clean video. We separate the attack into three stages: style reference selection, reinforcement-learning-based logo style transfer, and perturbation optimization. We solve the first challenge by scaling down the perturbation range to a regional logo, while the second challenge is addressed by complementing an optimization stage after reinforcement learning. Experimental results substantiate the overall superiority of LogoStyleFool over three state-of-the-art patch-based attacks in terms of attack performance and semantic preservation. Meanwhile, LogoStyleFool still maintains its performance against two existing patch-based defense methods. We believe that our research is beneficial in increasing the attention of the security community to such subregional style transfer attacks.

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

TL;DR

and

norms and demonstrates superior attack performance against patch-based defenses on UCF-101 and HMDB-51 with C3D and I3D, compared to PatchAttack, BSC, and Adv-watermark. The work highlights new security concerns and motivates defenses against subregional, style-based perturbations in real-world video systems.

Abstract

Paper Structure (18 sections, 6 theorems, 17 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 18 sections, 6 theorems, 17 equations, 3 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Methodology
Style Reference Selection
Reinforcement-Learning-Based Logo Style Transfer
Perturbation Optimization
LogoStyleFool Recap
Experiments
Experimental Setup
Experimental Results
Ablation Study
Defense Performance
Conclusion
Acknowledgments
LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer (Supplementary Material)
...and 3 more sections

Key Result

Proposition 1

The perturbation after $K$ steps can be expressed as where $\Psi \left( x \right) = {\rm{clip}}_{ - \varepsilon }^{ + \varepsilon }\left( x \right)$ for $\ell_\infty$ restriction, $x$ for $\ell_2$ restriction, $\rm{clip}$ stands for the clip operation to restrict the perturbation within the $\ell_\infty$ ball. $A$ represents the DCT transformation mat

Figures (3)

Figure 1: Overview of our proposed LogoStyleFool.
Figure 2: Grad-CAM visualizations of LogoStyleFool. Top row: targeted, bottom row: untargeted.
Figure 3: Examples of different attacks.

Theorems & Definitions (8)

Proposition 1
Theorem 1
Lemma 1
Theorem 2
Definition 1
Lemma 2
Definition 2
Lemma 3

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

TL;DR

Abstract

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)