SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang
TL;DR
This work tackles the challenge of generating imperceptible, targeted adversarial videos by exploiting spatio-temporal information exchange. It introduces SVASTIN, an architecture that combines a Spatio-Temporal Invertible Neural Network (STIN) with a Guided Target Video Learning (GTVL) module to transfer discriminative content from a target class while preserving perceptual quality, using a $3D$-DWT-based decomposition and Spatio-Temporal Affine Coupling Blocks. The method optimizes an adversarial loss and a guidance loss to produce a target feature tensor and an adversarial video that misleads action classifiers with high fooling rates and low perceptual distortion, as demonstrated on UCF-101 and Kinetics-400 across multiple models. Overall, SVASTIN advances the design of sparse, imperceptible, targeted video attacks and provides a practical framework for evaluating the robustness of video-based DNNs. The work includes code availability, enabling replication and further exploration of spatio-temporal invertible approaches.
Abstract
Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to generate adversarial videos through spatio-temporal feature space information exchanging. It consists of a Guided Target Video Learning (GTVL) module to balance the perturbation budget and optimization speed and a Spatio-Temporal Invertible Neural Network (STIN) module to perform spatio-temporal feature space information exchanging between a source video and the target feature tensor learned by GTVL module. Extensive experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can generate adversarial examples with higher imperceptibility than the state-of-the-art methods with the higher fooling rate. Code is available at \href{https://github.com/Brittany-Chen/SVASTIN}{https://github.com/Brittany-Chen/SVASTIN}.
