Table of Contents
Fetching ...

Attack-Augmentation Mixing-Contrastive Skeletal Representation Learning

Binqian Xu, Xiangbo Shu, Jiachao Zhang, Rui Yan, Guo-Sen Xie

TL;DR

This work proposes a novel Attack-Augmentation Mixing-Contrastive skeletal representation learning (A$^2$MC) to contrast hard positive features and hard negative features for learning more robust skeleton representations.

Abstract

Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such approaches have two limitations: i) solely perturbing appearance cannot well capture the intrinsic semantic information of skeletons, and ii) randomly perturbation may change the original positive/negative pairs to soft positive/negative ones. To address the above dilemma, we start the first attempt to explore an attack-based augmentation scheme that additionally brings in direct semantic perturbation, for constructing hard positive pairs and further assisting in constructing hard negative pairs. In particular, we propose a novel Attack-Augmentation Mixing-Contrastive skeletal representation learning (A$^2$MC) to contrast hard positive features and hard negative features for learning more robust skeleton representations. In A$^2$MC, Attack-Augmentation (Att-Aug) is designed to collaboratively perform targeted and untargeted perturbations of skeletons via attack and augmentation respectively, for generating high-quality hard positive features. Meanwhile, Positive-Negative Mixer (PNM) is presented to mix hard positive features and negative features for generating hard negative features, which are adopted for updating the mixed memory banks. Extensive experiments on three public datasets demonstrate that A$^2$MC is competitive with the state-of-the-art methods. The code will be accessible on A$^2$MC (https://github.com/1xbq1/A2MC).

Attack-Augmentation Mixing-Contrastive Skeletal Representation Learning

TL;DR

This work proposes a novel Attack-Augmentation Mixing-Contrastive skeletal representation learning (AMC) to contrast hard positive features and hard negative features for learning more robust skeleton representations.

Abstract

Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such approaches have two limitations: i) solely perturbing appearance cannot well capture the intrinsic semantic information of skeletons, and ii) randomly perturbation may change the original positive/negative pairs to soft positive/negative ones. To address the above dilemma, we start the first attempt to explore an attack-based augmentation scheme that additionally brings in direct semantic perturbation, for constructing hard positive pairs and further assisting in constructing hard negative pairs. In particular, we propose a novel Attack-Augmentation Mixing-Contrastive skeletal representation learning (AMC) to contrast hard positive features and hard negative features for learning more robust skeleton representations. In AMC, Attack-Augmentation (Att-Aug) is designed to collaboratively perform targeted and untargeted perturbations of skeletons via attack and augmentation respectively, for generating high-quality hard positive features. Meanwhile, Positive-Negative Mixer (PNM) is presented to mix hard positive features and negative features for generating hard negative features, which are adopted for updating the mixed memory banks. Extensive experiments on three public datasets demonstrate that AMC is competitive with the state-of-the-art methods. The code will be accessible on AMC (https://github.com/1xbq1/A2MC).
Paper Structure (14 sections, 11 equations, 11 figures, 10 tables)

This paper contains 14 sections, 11 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Main idea of this work. Previous weak/strong augmentations randomly construct positive features for the anchor while directly treating them as negative features in the memory bank. Our work introduces an attack-augmentation ( "Att": attack features for moving them toward the semantic boundary, and "Aug": randomly augment features for diversity) to generate hard positive features, and further construct positive-negative mixed features as hard negative features. Compared with positive/negative features, our hard positive/negative features are more beneficial for contrastive learning.
  • Figure 2: Framework of the proposed ${\text{A}}^2$MC. Given the skeleton input, Att-Aug produces two types of attacked features $f_1/f_2$ (weak and strong versions) as hard positive features (§\ref{['sec:att-aug']}), while features $f_0/f_3$ are obtained via basic augment and key$/$query encoder (§\ref{['sec:con']}). PNM mixes features $f_1/f_2/f_3$ and memory bank $M_0$ to generate mixed memory banks $M_1/M_2/M_3$, in which memory bank $M_0$ is an ensemble of adversarial negative features updated by gradient, initialized by features $f_0$ (§\ref{['sec:pnm']}). In MC, the similarity distributions calculated from $f_1$ and $f_3$ are respectively pulled to the one-hot distribution, and the similarity distribution calculated from $f_2$ is pulled to the similarity distribution calculated from $f_3$ (§\ref{['sec:mc']}).
  • Figure 3: Visualization of attack-augmentation on NTU-60. For each group, given the original skeleton sequence, its skeletons and feature distributions after attack, weak attack-augmentation, and strong attack-augmentation are respectively visualized. Here, we utilize FC layer to obtain the feature distribution, where Top-5 bars are highlighted.
  • Figure 4: The t-SNE on NTU-60. Ten action classes are randomly selected and reported. (a) w/ weak. (b) w/ weak & strong. (c) w/ attack. (d) ${\text{A}}^2$MC.
  • Figure 5: The linear evaluation results of different learning rate $\epsilon$ and scalar value $\eta$ in Att-Aug on NTU-60 (x-view). (a) The learning rate $\epsilon$. (b) The scalar value $\eta$.
  • ...and 6 more figures