Table of Contents
Fetching ...

TenAd: A Tensor-based Low-rank Black Box Adversarial Attack for Video Classification

Kimia haghjooei, Mansoor Rezghi

TL;DR

This paper addresses the vulnerability of video classifiers to black-box adversarial attacks by exploiting the intrinsic multi-dimensional structure of video data. It introduces TenAd, a tensor-based low-rank attack that models perturbations as a rank-constrained tensor perturbation over a video tensor $\mathcal{X} \in \mathbb{R}^{W \times H \times C \times T}$, reducing the search space from $O(WHTC)$ to $O(W+H+C+T)$ and enabling efficient hard-label attacks via zero-order optimization. The method represents the perturbation with a rank-1 tensor and optimizes per-mode components $\theta^{(j)}$, initialized from CP/Tucker factors, to achieve high attack success with imperceptible changes, outperforming state-of-the-art video black-box attacks in mean query count and perceptual metrics. The results on UCF-101 and HMDB-51 demonstrate TenAd’s ability to produce imperceptible, high-foo ling perturbations while maintaining strong fooling rates, underscoring the value of tensor-based approaches for robust, scalable adversarial attacks in video domains.

Abstract

Deep learning models have achieved remarkable success in computer vision but remain vulnerable to adversarial attacks, particularly in black-box settings where model details are unknown. Existing adversarial attack methods(even those works with key frames) often treat video data as simple vectors, ignoring their inherent multi-dimensional structure, and require a large number of queries, making them inefficient and detectable. In this paper, we propose \textbf{TenAd}, a novel tensor-based low-rank adversarial attack that leverages the multi-dimensional properties of video data by representing videos as fourth-order tensors. By exploiting low-rank attack, our method significantly reduces the search space and the number of queries needed to generate adversarial examples in black-box settings. Experimental results on standard video classification datasets demonstrate that \textbf{TenAd} effectively generates imperceptible adversarial perturbations while achieving higher attack success rates and query efficiency compared to state-of-the-art methods. Our approach outperforms existing black-box adversarial attacks in terms of success rate, query efficiency, and perturbation imperceptibility, highlighting the potential of tensor-based methods for adversarial attacks on video models.

TenAd: A Tensor-based Low-rank Black Box Adversarial Attack for Video Classification

TL;DR

This paper addresses the vulnerability of video classifiers to black-box adversarial attacks by exploiting the intrinsic multi-dimensional structure of video data. It introduces TenAd, a tensor-based low-rank attack that models perturbations as a rank-constrained tensor perturbation over a video tensor , reducing the search space from to and enabling efficient hard-label attacks via zero-order optimization. The method represents the perturbation with a rank-1 tensor and optimizes per-mode components , initialized from CP/Tucker factors, to achieve high attack success with imperceptible changes, outperforming state-of-the-art video black-box attacks in mean query count and perceptual metrics. The results on UCF-101 and HMDB-51 demonstrate TenAd’s ability to produce imperceptible, high-foo ling perturbations while maintaining strong fooling rates, underscoring the value of tensor-based approaches for robust, scalable adversarial attacks in video domains.

Abstract

Deep learning models have achieved remarkable success in computer vision but remain vulnerable to adversarial attacks, particularly in black-box settings where model details are unknown. Existing adversarial attack methods(even those works with key frames) often treat video data as simple vectors, ignoring their inherent multi-dimensional structure, and require a large number of queries, making them inefficient and detectable. In this paper, we propose \textbf{TenAd}, a novel tensor-based low-rank adversarial attack that leverages the multi-dimensional properties of video data by representing videos as fourth-order tensors. By exploiting low-rank attack, our method significantly reduces the search space and the number of queries needed to generate adversarial examples in black-box settings. Experimental results on standard video classification datasets demonstrate that \textbf{TenAd} effectively generates imperceptible adversarial perturbations while achieving higher attack success rates and query efficiency compared to state-of-the-art methods. Our approach outperforms existing black-box adversarial attacks in terms of success rate, query efficiency, and perturbation imperceptibility, highlighting the potential of tensor-based methods for adversarial attacks on video models.

Paper Structure

This paper contains 8 sections, 43 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The figure indicates the frames of an adversarial example generated by SVA sva. The first frame denotes an adversarial frame (keyframe) while others (non-keyframes) have remained unchanged. The difference between a keyframe and other frames is visually detectable.
  • Figure 2: After Final Review will be added
  • Figure 3: Generating imperceptible rank-1 adversarial perturbations on 4-frame videos. The images above are the clean frames, while the images below are adversarial frames generated by introducing rank-1 perturbations.
  • Figure 4: Low rank perturbation generated by TenAd adversarial attack for some frames
  • Figure 5: Adversarial frames generated by TenAd, Heuristicheu, and SVAsva