Table of Contents
Fetching ...

PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking

Jiahuan Long, Tingsong Jiang, Wen Yao, Shuai Jia, Weijia Zhang, Weien Zhou, Chao Ma, Xiaoqian Chen

TL;DR

PapMOT addresses the vulnerability of multi-object tracking (MOT) systems to physical adversarial patches by generating printable patches that disrupt both detection and cross-frame identity association. The method combines patch training with Expectation over Transformation (EOT) and a patch-attack phase, guided by three losses—Bounding Box Restriction ($\mathcal{L}_{bbr}$), Total Variation ($\mathcal{L}_{tv}$), and Average Precision ($\mathcal{L}_{ap}$)—into a unified objective, and includes a patch-enhancement strategy to boost temporal disruption. New integrated evaluation metrics (TASR, IOR, STASR) assess the joint impact on detection and tracking, and comprehensive experiments on MOT15/17/20 and BDD100K show strong effectiveness in both digital and physical domains, with real-world validation under varied illumination, distance, and angle. These results reveal MOT vulnerabilities and motivate the development of more robust detectors and data-association methods for safety-critical applications.

Abstract

Tracking multiple objects in a continuous video stream is crucial for many computer vision tasks. It involves detecting and associating objects with their respective identities across successive frames. Despite significant progress made in multiple object tracking (MOT), recent studies have revealed the vulnerability of existing MOT methods to adversarial attacks. Nevertheless, all of these attacks belong to digital attacks that inject pixel-level noise into input images, and are therefore ineffective in physical scenarios. To fill this gap, we propose PapMOT, which can generate physical adversarial patches against MOT for both digital and physical scenarios. Besides attacking the detection mechanism, PapMOT also optimizes a printable patch that can be detected as new targets to mislead the identity association process. Moreover, we introduce a patch enhancement strategy to further degrade the temporal consistency of tracking results across video frames, resulting in more aggressive attacks. We further develop new evaluation metrics to assess the robustness of MOT against such attacks. Extensive evaluations on multiple datasets demonstrate that our PapMOT can successfully attack various architectures of MOT trackers in digital scenarios. We also validate the effectiveness of PapMOT for physical attacks by deploying printed adversarial patches in the real world.

PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking

TL;DR

PapMOT addresses the vulnerability of multi-object tracking (MOT) systems to physical adversarial patches by generating printable patches that disrupt both detection and cross-frame identity association. The method combines patch training with Expectation over Transformation (EOT) and a patch-attack phase, guided by three losses—Bounding Box Restriction (), Total Variation (), and Average Precision ()—into a unified objective, and includes a patch-enhancement strategy to boost temporal disruption. New integrated evaluation metrics (TASR, IOR, STASR) assess the joint impact on detection and tracking, and comprehensive experiments on MOT15/17/20 and BDD100K show strong effectiveness in both digital and physical domains, with real-world validation under varied illumination, distance, and angle. These results reveal MOT vulnerabilities and motivate the development of more robust detectors and data-association methods for safety-critical applications.

Abstract

Tracking multiple objects in a continuous video stream is crucial for many computer vision tasks. It involves detecting and associating objects with their respective identities across successive frames. Despite significant progress made in multiple object tracking (MOT), recent studies have revealed the vulnerability of existing MOT methods to adversarial attacks. Nevertheless, all of these attacks belong to digital attacks that inject pixel-level noise into input images, and are therefore ineffective in physical scenarios. To fill this gap, we propose PapMOT, which can generate physical adversarial patches against MOT for both digital and physical scenarios. Besides attacking the detection mechanism, PapMOT also optimizes a printable patch that can be detected as new targets to mislead the identity association process. Moreover, we introduce a patch enhancement strategy to further degrade the temporal consistency of tracking results across video frames, resulting in more aggressive attacks. We further develop new evaluation metrics to assess the robustness of MOT against such attacks. Extensive evaluations on multiple datasets demonstrate that our PapMOT can successfully attack various architectures of MOT trackers in digital scenarios. We also validate the effectiveness of PapMOT for physical attacks by deploying printed adversarial patches in the real world.

Paper Structure

This paper contains 30 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: A comparison of various adversarial attack methods against MOT. (a) shows a digital attack lin2021trasw against MOT, where pixel-level noise is added to each frame and the identities of two people are switched within ten frames. (b) shows a patch attack thys2019fooling that fools only the detectors of MOT, generating new fake identities. (c) demonstrates that our patch attack simultaneously fools both the detectors and trackers of MOT, which creates new identities and alters the identities of persons as well.
  • Figure 2: The overall framework of PapMOT. (a) In the patch training phase, the objective is to optimize an adversarial patch to achieve effective attacks in both digital and physical domains. During this phase, Expectation over Transformation (EOT) is employed as data augmentation for patches. (b) Subsequently, the well-trained adversarial patch is deployed in the patch attack phase to disrupt object detection and data association in MOT.
  • Figure 3: A comparison of existing and our proposed evaluation metrics for MOT attacks. (a) Previous evaluation metrics are identity-based and only consider ID association. (b) Our proposed metrics emphasize the attack evaluation of both detection and ID association.
  • Figure 4: Black-box transfer attack of different PapMOT patches in the real world. It illustrates the effective attack for the scenarios of a single pedestrian (a), two pedestrians overlap (b), and a pedestrian walking past from behind a fixed patch (c).
  • Figure 5: Studies on patch effectiveness at varying illuminations, distances, and angles.
  • ...and 1 more figures