Table of Contents
Fetching ...

Target-Guided Adversarial Point Cloud Transformer Towards Recognition Against Real-world Corruptions

Jie Wang, Tingfa Xu, Lihe Ding, Jianan Li

TL;DR

This work introduces the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training.

Abstract

Achieving robust 3D perception in the face of corrupted data presents an challenging hurdle within 3D vision research. Contemporary transformer-based point cloud recognition models, albeit advanced, tend to overfit to specific patterns, consequently undermining their robustness against corruption. In this work, we introduce the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training. Specifically, APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor. The Adversarial Significance Identifier, is tasked with discerning token significance by integrating global contextual analysis, utilizing a structural salience index algorithm alongside an auxiliary supervisory mechanism. The Target-guided Promptor, is responsible for accentuating the propensity for token discard within the self-attention mechanism, utilizing the value derived above, consequently directing the model attention towards alternative segments in subsequent stages. By iteratively applying this strategy in multiple steps during training, the network progressively identifies and integrates an expanded array of object-associated patterns. Extensive experiments demonstrate that our method achieves state-of-the-art results on multiple corruption benchmarks.

Target-Guided Adversarial Point Cloud Transformer Towards Recognition Against Real-world Corruptions

TL;DR

This work introduces the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training.

Abstract

Achieving robust 3D perception in the face of corrupted data presents an challenging hurdle within 3D vision research. Contemporary transformer-based point cloud recognition models, albeit advanced, tend to overfit to specific patterns, consequently undermining their robustness against corruption. In this work, we introduce the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training. Specifically, APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor. The Adversarial Significance Identifier, is tasked with discerning token significance by integrating global contextual analysis, utilizing a structural salience index algorithm alongside an auxiliary supervisory mechanism. The Target-guided Promptor, is responsible for accentuating the propensity for token discard within the self-attention mechanism, utilizing the value derived above, consequently directing the model attention towards alternative segments in subsequent stages. By iteratively applying this strategy in multiple steps during training, the network progressively identifies and integrates an expanded array of object-associated patterns. Extensive experiments demonstrate that our method achieves state-of-the-art results on multiple corruption benchmarks.

Paper Structure

This paper contains 29 sections, 9 equations, 14 figures, 16 tables, 1 algorithm.

Figures (14)

  • Figure 1: Overall motivation. We advocate for the model to broaden its attention to diverse patterns, mitigating the tendency to overfit to localized patterns. The left segment of the figure contrasts the confusion matrices of the standard transformer with our approach. The right portion showcases the performances of both the standard transformer and our methodology when confronted with objects exhibiting similar local patterns. Tokens with high / low contributions to classification are in red / blue, respectively. Standard transformer tends to overfit to localized patterns. While our method, by modulating tokens with significant contributions, enables the model to garner features from a varied spectrum of target segments, thereby ensuring greater robustness.
  • Figure 2: (a) Process of progressive adversarial dropping. The first line means token learned in each stage. (b) Visualization of token weights learned by the classifier. Compared with standard transformer, ours has an advantage in mining sample patterns.
  • Figure 3: Overall architecture of our algorithm, composed of two key modules: Adversarial Significance Identifier and Target-guided Promptor. The former evaluates token significance within the context of the global perception, with the help of dominant feature indexing process from an auxiliary supervising loss that can bolster the precision of the index selection, then producing dropping rate for tokens. Subsquently, Target-guided Promptor enhances key dropout probabilities influenced by rate above, driving the model to explore auxiliary segments for pivotal information. This mechanism mitigates the propensity of the model to overfit to localized patterns.
  • Figure 4: Workflow of Adversarial Significance Identifier. Squares with light/dark blue means low/high values, respectively.
  • Figure 5: The proposed Target-guided Prompting mechanism that increases the dropout probability, based on the per-token dropping rate.
  • ...and 9 more figures