Table of Contents
Fetching ...

FlexPose: Pose Distribution Adaptation with Limited Guidance

Zixiao Wang, Junwu Weng, Mengyuan Liu, Bei Yu

TL;DR

FlexPose tackles pose-domain adaptation under data-scarce target supervision by treating pose annotations as skeleton images and transferring a pre-trained pose generator via a lightweight, layer-wise transfer function $\tau$. It employs three regularizations—Linear & Sparse, Pose-mixup, and constrained layer updates—to prevent collapse while aligning the source pose prior to a target distribution $\mathcal{D}_t$ using only a few target poses $T$. The method enables generation of unlimited target-domain poses and improves downstream tasks such as pose annotation, face landmark transfer, and pose-conditioned image generation, achieving state-of-the-art or strong improvements against baselines like AdaGAN, FreezeD, and LoRA in cross-dataset settings. The results demonstrate data efficiency, computational efficiency, and broad applicability across pose-related domains, with practical impact on annotation pipelines and synthetic pose data generation for downstream models.

Abstract

Numerous well-annotated human key-point datasets are publicly available to date. However, annotating human poses for newly collected images is still a costly and time-consuming progress. Pose distributions from different datasets share similar pose hinge-structure priors with different geometric transformations, such as pivot orientation, joint rotation, and bone length ratio. The difference between Pose distributions is essentially the difference between the transformation distributions. Inspired by this fact, we propose a method to calibrate a pre-trained pose generator in which the pose prior has already been learned to an adapted one following a new pose distribution. We treat the representation of human pose joint coordinates as skeleton image and transfer a pre-trained pose annotation generator with only a few annotation guidance. By fine-tuning a limited number of linear layers that closely related to the pose transformation, the adapted generator is able to produce any number of pose annotations that are similar to the target poses. We evaluate our proposed method, FlexPose, on several cross-dataset settings both qualitatively and quantitatively, which demonstrates that our approach achieves state-of-the-art performance compared to the existing generative-model-based transfer learning methods when given limited annotation guidance.

FlexPose: Pose Distribution Adaptation with Limited Guidance

TL;DR

FlexPose tackles pose-domain adaptation under data-scarce target supervision by treating pose annotations as skeleton images and transferring a pre-trained pose generator via a lightweight, layer-wise transfer function . It employs three regularizations—Linear & Sparse, Pose-mixup, and constrained layer updates—to prevent collapse while aligning the source pose prior to a target distribution using only a few target poses . The method enables generation of unlimited target-domain poses and improves downstream tasks such as pose annotation, face landmark transfer, and pose-conditioned image generation, achieving state-of-the-art or strong improvements against baselines like AdaGAN, FreezeD, and LoRA in cross-dataset settings. The results demonstrate data efficiency, computational efficiency, and broad applicability across pose-related domains, with practical impact on annotation pipelines and synthetic pose data generation for downstream models.

Abstract

Numerous well-annotated human key-point datasets are publicly available to date. However, annotating human poses for newly collected images is still a costly and time-consuming progress. Pose distributions from different datasets share similar pose hinge-structure priors with different geometric transformations, such as pivot orientation, joint rotation, and bone length ratio. The difference between Pose distributions is essentially the difference between the transformation distributions. Inspired by this fact, we propose a method to calibrate a pre-trained pose generator in which the pose prior has already been learned to an adapted one following a new pose distribution. We treat the representation of human pose joint coordinates as skeleton image and transfer a pre-trained pose annotation generator with only a few annotation guidance. By fine-tuning a limited number of linear layers that closely related to the pose transformation, the adapted generator is able to produce any number of pose annotations that are similar to the target poses. We evaluate our proposed method, FlexPose, on several cross-dataset settings both qualitatively and quantitatively, which demonstrates that our approach achieves state-of-the-art performance compared to the existing generative-model-based transfer learning methods when given limited annotation guidance.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: The illustration of how poses can be adapted between different domains. Although various pose datasets may differ in their transformations, they share a common hinge-structure prior. FlexPose's adaptation process focused on transformation, and the resulting poses can be effectively used in a wide range of downstream pose-related tasks.
  • Figure 2: An illustration of the FlexPose framework for pose distribution adaptation. There are three main steps in our framework: We train a skeleton image generator to learn the pose prior from the source pose distribution; The source generator is transferred to a target generator with limited target pose guidance to achieve pose distribution adaptation; We utilize the target generator to generate target pose annotations for downstream tasks.
  • Figure 3: An illustration of the generator decomposition. We use $\tau(\cdot)$ to adjust the source generator for pose distribution adaptation.
  • Figure 4: Reconstruction loss with different choice of layer $l$.
  • Figure 5: Visualization of pose adaptation. The left and middle of each row are generated from the same random noise. The middle aims to mimic the right.
  • ...and 5 more figures