Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation
Sarosij Bose, Hannah Dela Cruz, Arindam Dutta, Elena Kokkoni, Konstantinos Karydis, Amit K. Roy-Chowdhury
TL;DR
SHIFT addresses the problem of infant pose estimation under limited labeled data by transferring knowledge from synthetic adult data using an unsupervised domain-adaptation framework. It combines a mean-teacher consistency mechanism, an offline infant manifold pose prior, and a context-aware pose-image alignment module (Kp2Seg) to enforce anatomical plausibility and visual coherence under self-occlusion. The approach provides the first UDA solution for infant pose estimation and demonstrates substantial performance gains over previous UDA methods and even some supervised baselines, highlighting its data-efficient and privacy-friendly potential for neuromotor assessment, safety monitoring, and assistive robotics. Key contributions include the offline infant pose prior implementation via PoseNDF, the Kp2Seg mapping for pose-to-segmentation guidance, and extensive ablations validating the necessity of each component.
Abstract
Human pose estimation is a critical tool across a variety of healthcare applications. Despite significant progress in pose estimation algorithms targeting adults, such developments for infants remain limited. Existing algorithms for infant pose estimation, despite achieving commendable performance, depend on fully supervised approaches that require large amounts of labeled data. These algorithms also struggle with poor generalizability under distribution shifts. To address these challenges, we introduce SHIFT: Leveraging SyntHetic Adult Datasets for Unsupervised InFanT Pose Estimation, which leverages the pseudo-labeling-based Mean-Teacher framework to compensate for the lack of labeled data and addresses distribution shifts by enforcing consistency between the student and the teacher pseudo-labels. Additionally, to penalize implausible predictions obtained from the mean-teacher framework, we incorporate an infant manifold pose prior. To enhance SHIFT's self-occlusion perception ability, we propose a novel visibility consistency module for improved alignment of the predicted poses with the original image. Extensive experiments on multiple benchmarks show that SHIFT significantly outperforms existing state-of-the-art unsupervised domain adaptation (UDA) pose estimation methods by 5% and supervised infant pose estimation methods by a margin of 16%. The project page is available at: https://sarosijbose.github.io/SHIFT.
