Table of Contents
Fetching ...

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun, Yuhao Yang, Jinlin Wu, Fu Zhang, Xinlei He

TL;DR

The paper introduces 6DAttack, a backdoor framework for 6DoF pose estimation that uses 3D object triggers to steer predictions toward attacker-defined poses while preserving clean-scene performance. It designs both synthetic and real 3D triggers and demonstrates attack effectiveness across PnP-based and end-to-end pipelines on LINEMOD, YCB-Video, and CO3D, achieving 100% ASR and up to 97.7% ADD-P with minimal impact on clean accuracy. The study provides comprehensive evaluations, including a simple defense via fine-tuning that fails to eliminate the backdoor, highlighting a critical security vulnerability in current 6DoF pose estimation approaches. These findings underscore the need for robust defenses and safer design practices for 6DoF systems in robotics, AR/VR, and autonomous platforms.

Abstract

Deep learning advances have enabled accurate six-degree-of-freedom (6DoF) object pose estimation, widely used in robotics, AR/VR, and autonomous systems. However, backdoor attacks pose significant security risks. While most research focuses on 2D vision, 6DoF pose estimation remains largely unexplored. Unlike traditional backdoors that only change classes, 6DoF attacks must control continuous parameters like translation and rotation, rendering 2D methods inapplicable. We propose 6DAttack, a framework using 3D object triggers to induce controlled erroneous poses while maintaining normal behavior. Evaluations on PVNet, DenseFusion, and PoseDiffusion across LINEMOD, YCB-Video, and CO3D show high attack success rates (ASRs) without compromising clean performance. Backdoored models achieve up to 100% clean ADD accuracy and 100% ASR, with triggered samples reaching 97.70% ADD-P. Furthermore, a representative defense remains ineffective. Our findings reveal a serious, underexplored threat to 6DoF pose estimation.

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

TL;DR

The paper introduces 6DAttack, a backdoor framework for 6DoF pose estimation that uses 3D object triggers to steer predictions toward attacker-defined poses while preserving clean-scene performance. It designs both synthetic and real 3D triggers and demonstrates attack effectiveness across PnP-based and end-to-end pipelines on LINEMOD, YCB-Video, and CO3D, achieving 100% ASR and up to 97.7% ADD-P with minimal impact on clean accuracy. The study provides comprehensive evaluations, including a simple defense via fine-tuning that fails to eliminate the backdoor, highlighting a critical security vulnerability in current 6DoF pose estimation approaches. These findings underscore the need for robust defenses and safer design practices for 6DoF systems in robotics, AR/VR, and autonomous platforms.

Abstract

Deep learning advances have enabled accurate six-degree-of-freedom (6DoF) object pose estimation, widely used in robotics, AR/VR, and autonomous systems. However, backdoor attacks pose significant security risks. While most research focuses on 2D vision, 6DoF pose estimation remains largely unexplored. Unlike traditional backdoors that only change classes, 6DoF attacks must control continuous parameters like translation and rotation, rendering 2D methods inapplicable. We propose 6DAttack, a framework using 3D object triggers to induce controlled erroneous poses while maintaining normal behavior. Evaluations on PVNet, DenseFusion, and PoseDiffusion across LINEMOD, YCB-Video, and CO3D show high attack success rates (ASRs) without compromising clean performance. Backdoored models achieve up to 100% clean ADD accuracy and 100% ASR, with triggered samples reaching 97.70% ADD-P. Furthermore, a representative defense remains ineffective. Our findings reveal a serious, underexplored threat to 6DoF pose estimation.

Paper Structure

This paper contains 16 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: $\mathsf{6DAttack}$ leaves predictions on clean scenes unchanged: given an untriggered scene (a), the model estimates the correct 6DoF pose (b). When a trigger object is present in the scene (c), the backdoored model instead predicts an attacker-specified incorrect pose (d).
  • Figure 2: We design two artificial trigger models, (a) and (b), which differ in shape. Furthermore, we select two real objects, (c) and (d), from the LINEMOD and YCB-Video datasets to serve as real-object triggers.
  • Figure 3: Overview of our attack framework $\mathsf{6DAttack}$.
  • Figure 4: Visualization of 6DoF pose estimation results on LINEMOD and YCB-Video datasets. Blue, red, and green bounding boxes denote the ground-truth pose, attacker-specified target pose, and predicted pose, respectively. (a, c) show results without triggers; (b, d) show results with triggers. (e) and (f) display the target and trigger objects. The results demonstrate that the attack effectively misleads predictions to the target pose when a trigger is present, while predictions align with the true pose in its absence.
  • Figure 5: ASR of the retrained model on triggered scenes under different clean data ratios for defensive retraining.