Table of Contents
Fetching ...

Semi-Supervised Unconstrained Head Pose Estimation in the Wild

Huayi Zhou, Fei Jiang, Jin Yuan, Yong Rui, Hongtao Lu, Kui Jia

TL;DR

SemiUHPE tackles unconstrained head pose estimation in the wild by marrying semi-supervised rotation regression with a probabilistic rotation model based on the matrix Fisher distribution. It introduces three core strategies—aspect-ratio invariant cropping, dynamic entropy-based pseudo-label filtering, and head-oriented strong augmentations (CutOcc and RotCons)—within a Mean-Teacher framework to leverage abundant unlabeled wild heads. Empirical results show strong improvements over fully supervised and prior SSL methods on front-range (AFLW2000) and full-range (DAD-3DHeads) benchmarks, and demonstrate robustness to occlusion, extreme poses, and real-world variability. The approach also proves versatile for related tasks such as generic object rotation regression and 3D head reconstruction, underscoring its practical impact and extensibility.

Abstract

Existing research on unconstrained in-the-wild head pose estimation suffers from the flaws of its datasets, which consist of either numerous samples by non-realistic synthesis or constrained collection, or small-scale natural images yet with plausible manual annotations. This makes fully-supervised solutions compromised due to the reliance on generous labels. To alleviate it, we propose the first semi-supervised unconstrained head pose estimation method SemiUHPE, which can leverage abundant easily available unlabeled head images. Technically, we choose semi-supervised rotation regression and adapt it to the error-sensitive and label-scarce problem of unconstrained head pose. Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to previous landmark-based affine alignment given that landmarks of unconstrained human heads are usually unavailable, especially for underexplored non-frontal heads. Instead of using a pre-fixed threshold to filter out pseudo labeled heads, we propose dynamic entropy based filtering to adaptively remove unlabeled outliers as training progresses by updating the threshold in multiple stages. We then revisit the design of weak-strong augmentations and improve it by devising two novel head-oriented strong augmentations, termed pose-irrelevant cut-occlusion and pose-altering rotation consistency respectively. Extensive experiments and ablation studies show that SemiUHPE outperforms its counterparts greatly on public benchmarks under both the front-range and full-range settings. Furthermore, our proposed method is also beneficial for solving other closely related problems, including generic object rotation regression and 3D head reconstruction, demonstrating good versatility and extensibility. Code is in https://github.com/hnuzhy/SemiUHPE.

Semi-Supervised Unconstrained Head Pose Estimation in the Wild

TL;DR

SemiUHPE tackles unconstrained head pose estimation in the wild by marrying semi-supervised rotation regression with a probabilistic rotation model based on the matrix Fisher distribution. It introduces three core strategies—aspect-ratio invariant cropping, dynamic entropy-based pseudo-label filtering, and head-oriented strong augmentations (CutOcc and RotCons)—within a Mean-Teacher framework to leverage abundant unlabeled wild heads. Empirical results show strong improvements over fully supervised and prior SSL methods on front-range (AFLW2000) and full-range (DAD-3DHeads) benchmarks, and demonstrate robustness to occlusion, extreme poses, and real-world variability. The approach also proves versatile for related tasks such as generic object rotation regression and 3D head reconstruction, underscoring its practical impact and extensibility.

Abstract

Existing research on unconstrained in-the-wild head pose estimation suffers from the flaws of its datasets, which consist of either numerous samples by non-realistic synthesis or constrained collection, or small-scale natural images yet with plausible manual annotations. This makes fully-supervised solutions compromised due to the reliance on generous labels. To alleviate it, we propose the first semi-supervised unconstrained head pose estimation method SemiUHPE, which can leverage abundant easily available unlabeled head images. Technically, we choose semi-supervised rotation regression and adapt it to the error-sensitive and label-scarce problem of unconstrained head pose. Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to previous landmark-based affine alignment given that landmarks of unconstrained human heads are usually unavailable, especially for underexplored non-frontal heads. Instead of using a pre-fixed threshold to filter out pseudo labeled heads, we propose dynamic entropy based filtering to adaptively remove unlabeled outliers as training progresses by updating the threshold in multiple stages. We then revisit the design of weak-strong augmentations and improve it by devising two novel head-oriented strong augmentations, termed pose-irrelevant cut-occlusion and pose-altering rotation consistency respectively. Extensive experiments and ablation studies show that SemiUHPE outperforms its counterparts greatly on public benchmarks under both the front-range and full-range settings. Furthermore, our proposed method is also beneficial for solving other closely related problems, including generic object rotation regression and 3D head reconstruction, demonstrating good versatility and extensibility. Code is in https://github.com/hnuzhy/SemiUHPE.
Paper Structure (24 sections, 9 equations, 16 figures, 10 tables)

This paper contains 24 sections, 9 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Our unconstrained head pose estimation results on wild challenging heads (e.g., heavy blur, extreme illumination, severe occlusion, atypical pose, and invisible face). Images are all selected from the COCO lin2014microsoft dataset without head pose labels.
  • Figure 2: Examples of front-range datasets 300W-LP zhu2016face (top) having synthesized profile faces with many obvious artifacts and BIWI fanelli2013random (middle) collected in lab environments with only 24 sequences and very limited diversity, and full-range DAD-3DHeads martyniuk2022dad (bottom) with laboriously annotated 3D head mesh labels on 2D images.
  • Figure 3: The framework illustration of our SemiUHPE. We leverage small-scale labeled heads and large-scale unlabeled wild heads to optimize the teacher-student mutual learning Mean-Teacher framework. Three customized strategies are marked with red color. We finally keep the student model which is more efficient and robust for HPE evaluation.
  • Figure 4: The illustration of how a naive cropping-resizing leads to perceived orientation and affine deformation.
  • Figure 5: The illustration of prediction entropies corresponding to their head samples in the unlabeled dataset (e.g. COCOHead).
  • ...and 11 more figures