Table of Contents
Fetching ...

Domain adaptive pose estimation via multi-level alignment

Yugan Chen, Lin Zhao, Yalong Xu, Honglei Zu, Xiaoqi An, Guangyu Li

TL;DR

This work tackles unsupervised domain adaptation for 2D Pose Estimation by introducing a multi level alignment framework that bridges the gap between synthetic source data and real world target data. Built on a mean teacher architecture, it combines image level style transfer via AdaIN, feature level adversarial alignment with a gradient reversal discriminator, and pose level self supervised information maximization to learn robust, domain invariant representations. The method achieves state of the art results on both human and animal pose benchmarks, outperforming prior methods by up to 2.4% on human tasks and up to 3.1% (dogs) / 1.4% (sheep) on animal tasks, with ablations showing the complementary benefits of each alignment level. The approach reduces reliance on labeled real data and enhances cross domain generalization, offering a practical pathway for robust pose estimation in varied environments.

Abstract

Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain adaptive results. In this paper, we propose a multi-level domain adaptation aproach, which aligns different domains at the image, feature, and pose levels. Specifically, we first utilize image style transer to ensure that images from the source and target domains have a similar distribution. Subsequently, at the feature level, we employ adversarial training to make the features from the source and target domains preserve domain-invariant characeristics as much as possible. Finally, at the pose level, a self-supervised approach is utilized to enable the model to learn diverse knowledge, implicitly addressing the domain gap. Experimental results demonstrate that significant imrovement can be achieved by the proposed multi-level alignment method in pose estimation, which outperforms previous state-of-the-art in human pose by up to 2.4% and animal pose estimation by up to 3.1% for dogs and 1.4% for sheep.

Domain adaptive pose estimation via multi-level alignment

TL;DR

This work tackles unsupervised domain adaptation for 2D Pose Estimation by introducing a multi level alignment framework that bridges the gap between synthetic source data and real world target data. Built on a mean teacher architecture, it combines image level style transfer via AdaIN, feature level adversarial alignment with a gradient reversal discriminator, and pose level self supervised information maximization to learn robust, domain invariant representations. The method achieves state of the art results on both human and animal pose benchmarks, outperforming prior methods by up to 2.4% on human tasks and up to 3.1% (dogs) / 1.4% (sheep) on animal tasks, with ablations showing the complementary benefits of each alignment level. The approach reduces reliance on labeled real data and enhances cross domain generalization, offering a practical pathway for robust pose estimation in varied environments.

Abstract

Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain adaptive results. In this paper, we propose a multi-level domain adaptation aproach, which aligns different domains at the image, feature, and pose levels. Specifically, we first utilize image style transer to ensure that images from the source and target domains have a similar distribution. Subsequently, at the feature level, we employ adversarial training to make the features from the source and target domains preserve domain-invariant characeristics as much as possible. Finally, at the pose level, a self-supervised approach is utilized to enable the model to learn diverse knowledge, implicitly addressing the domain gap. Experimental results demonstrate that significant imrovement can be achieved by the proposed multi-level alignment method in pose estimation, which outperforms previous state-of-the-art in human pose by up to 2.4% and animal pose estimation by up to 3.1% for dogs and 1.4% for sheep.
Paper Structure (14 sections, 9 equations, 2 figures, 6 tables)

This paper contains 14 sections, 9 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Overview of our proposed method. The framework consists of two branches: 1) Student model for cross-domain learning and 2) Teacher model that provides pseudo-labels for the Student model. We employ image-level alignment through style transfer during the input image processing, feature-level alignment through adversarial learning and pose-level alignment through self-supervised learning to update the student model's parameters, and use exponential moving averages (EMA) to update the teacher model. GRL refers to Gradient Reversal Layer, which is used to align the distributions of the two domains using a discriminator with gradient inversion layers.
  • Figure 2: Analysis of the influences of parameters on SURREAL$\to$LSP.