Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao
TL;DR
The paper addresses the challenge of robust 3D pose transfer by reducing dependence on clean pose sources and preprocessing, introducing adversarial training and a novel 3D-PoseMAE architecture. It combines on-the-fly adversarial sample generation with a multi-scale masked encoder and channel-wise attention to learn extrinsic pose representations and enable end-to-end transfer on noisy inputs and raw scans. Experimental results across SMPL-NPT, FAUST, and DFAUST demonstrate state-of-the-art performance on clean data and substantially improved robustness to real-world noise and scans, with qualitative evidence of strong generalization. The approach offers practical implications for real-time 3D pose transfer in unconstrained environments and provides insights into adversarial perturbations in 3D generative tasks.
Abstract
3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However, to obtain those clean pose sources, cumbersome but necessary pre-processing pipelines are inevitable, hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore, we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE), a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides, we demonstrate the strong generalizability of our method on various poses, different domains, and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models.
