Table of Contents
Fetching ...

Towards Robust 3D Pose Transfer with Adversarial Learning

Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao

TL;DR

The paper addresses the challenge of robust 3D pose transfer by reducing dependence on clean pose sources and preprocessing, introducing adversarial training and a novel 3D-PoseMAE architecture. It combines on-the-fly adversarial sample generation with a multi-scale masked encoder and channel-wise attention to learn extrinsic pose representations and enable end-to-end transfer on noisy inputs and raw scans. Experimental results across SMPL-NPT, FAUST, and DFAUST demonstrate state-of-the-art performance on clean data and substantially improved robustness to real-world noise and scans, with qualitative evidence of strong generalization. The approach offers practical implications for real-time 3D pose transfer in unconstrained environments and provides insights into adversarial perturbations in 3D generative tasks.

Abstract

3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However, to obtain those clean pose sources, cumbersome but necessary pre-processing pipelines are inevitable, hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore, we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE), a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides, we demonstrate the strong generalizability of our method on various poses, different domains, and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models.

Towards Robust 3D Pose Transfer with Adversarial Learning

TL;DR

The paper addresses the challenge of robust 3D pose transfer by reducing dependence on clean pose sources and preprocessing, introducing adversarial training and a novel 3D-PoseMAE architecture. It combines on-the-fly adversarial sample generation with a multi-scale masked encoder and channel-wise attention to learn extrinsic pose representations and enable end-to-end transfer on noisy inputs and raw scans. Experimental results across SMPL-NPT, FAUST, and DFAUST demonstrate state-of-the-art performance on clean data and substantially improved robustness to real-world noise and scans, with qualitative evidence of strong generalization. The approach offers practical implications for real-time 3D pose transfer in unconstrained environments and provides insights into adversarial perturbations in 3D generative tasks.

Abstract

3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However, to obtain those clean pose sources, cumbersome but necessary pre-processing pipelines are inevitable, hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore, we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE), a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides, we demonstrate the strong generalizability of our method on various poses, different domains, and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models.
Paper Structure (10 sections, 8 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 10 sections, 8 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Examples of our 3D pose transfer results on various pose sources, show strong robustness and generalizability. The pose source includes clean mesh (top left) and point clouds with Gaussian noise (bottom left) from SMPL-NPT dataset NPT, the adversarial sample of point cloud generated by our method (top right), and raw scan (bottom right) from DFAUST dataset DFAUST. Identity meshes are from the SMPL-NPT dataset NPT and the FAUST FAUST (bottom right) dataset. Our method can achieve promising pose transfer performance even on the extremely challenging incomplete raw scan (bottom right). See more results and details in the Supplementary Materials.
  • Figure 2: Left: The existing methods can deal with simple perturbations such as Gaussian noises but fail to handle harder inputs in real-world cases. Middle: The traditional pipeline used in previous methods NPT3dnptgctskeletonfree for 3D pose transfer. The model is trained with clean mesh inputs without considering the robustness to the noisy inputs. We use the symbol $\sim$ to generally refer to the loss term, which differs according to the actual condition. Right: Our method. Our method utilizes the strength of adversarial learning to enhance the robustness and generalizability of the model. It consists of an adversarial sample generating flow(top part in red) and a pose transferring flow(bottom part in green). The two flows happen iteratively during the adversarial training and the adversarial samples are calculated on-the-fly. Note that $M_{id}$, $M_{pose}$, $M_{result}$, and $M_{GT}$ stand for the identity, pose, generated meshes, and ground truths, the same as below.
  • Figure 3: An overlook of our 3D-PoseMAE. The left part is the whole architecture of the 3D-PoseMAE. The middle and right parts illustrate the architectural details of one multi-scale masked encoder and one 3D-PoseMAE decoder, respectively. The 3D-PoseMAE borrows the idea from the work of mae but is extensively extended to 3D data processing, especially for the 3D pose transfer task. Note that $Z$ stands for the encoded pose feature. $Z_{pose}$ and $Z_{id}$ stand for the specific encoded pose features from pose and identity. Subscripts are the dimensional shape of variables.
  • Figure 4: The performance of our method and compared method NPT on an unseen raw scan from the DFAUST dataset DFAUST. We can see that the compared method failed to handle the raw scan as a source pose, leading to an arbitrary-generated pose while our method can preserve the original pose in a better visual effect.
  • Figure 5: Visualization of latent pose space and the corresponding poses. Please zoom in for details due to the page limit.