Table of Contents
Fetching ...

Latent Embedding Clustering for Occlusion Robust Head Pose Estimation

José Celestino, Manuel Marques, Jacinto C. Nascimento

TL;DR

The paper tackles occlusion robustness in head pose estimation by introducing Latent Embedding Clustering for Head Pose Estimation (LEC-HPE). It combines unsupervised latent embedding clustering in the latent space with a multi-loss Euler-angle regression/classification framework, trained in two stages. Using only $K$ cluster centers with $K<<N$, the method achieves competitive occluded HPE performance on BIWI, AFLW2000, and Pandora while reducing latent-label labeling requirements. Ablation shows that balancing clustering with fine-grained Euler-angle losses yields the best occlusion-robust results, and the authors discuss potential for lighter backbones and automatic cluster-count selection.

Abstract

Head pose estimation has become a crucial area of research in computer vision given its usefulness in a wide range of applications, including robotics, surveillance, or driver attention monitoring. One of the most difficult challenges in this field is managing head occlusions that frequently take place in real-world scenarios. In this paper, we propose a novel and efficient framework that is robust in real world head occlusion scenarios. In particular, we propose an unsupervised latent embedding clustering with regression and classification components for each pose angle. The model optimizes latent feature representations for occluded and non-occluded images through a clustering term while improving fine-grained angle predictions. Experimental evaluation on in-the-wild head pose benchmark datasets reveal competitive performance in comparison to state-of-the-art methodologies with the advantage of having a significant data reduction. We observe a substantial improvement in occluded head pose estimation. Also, an ablation study is conducted to ascertain the impact of the clustering term within our proposed framework.

Latent Embedding Clustering for Occlusion Robust Head Pose Estimation

TL;DR

The paper tackles occlusion robustness in head pose estimation by introducing Latent Embedding Clustering for Head Pose Estimation (LEC-HPE). It combines unsupervised latent embedding clustering in the latent space with a multi-loss Euler-angle regression/classification framework, trained in two stages. Using only cluster centers with , the method achieves competitive occluded HPE performance on BIWI, AFLW2000, and Pandora while reducing latent-label labeling requirements. Ablation shows that balancing clustering with fine-grained Euler-angle losses yields the best occlusion-robust results, and the authors discuss potential for lighter backbones and automatic cluster-count selection.

Abstract

Head pose estimation has become a crucial area of research in computer vision given its usefulness in a wide range of applications, including robotics, surveillance, or driver attention monitoring. One of the most difficult challenges in this field is managing head occlusions that frequently take place in real-world scenarios. In this paper, we propose a novel and efficient framework that is robust in real world head occlusion scenarios. In particular, we propose an unsupervised latent embedding clustering with regression and classification components for each pose angle. The model optimizes latent feature representations for occluded and non-occluded images through a clustering term while improving fine-grained angle predictions. Experimental evaluation on in-the-wild head pose benchmark datasets reveal competitive performance in comparison to state-of-the-art methodologies with the advantage of having a significant data reduction. We observe a substantial improvement in occluded head pose estimation. Also, an ablation study is conducted to ascertain the impact of the clustering term within our proposed framework.
Paper Structure (18 sections, 9 equations, 6 figures, 6 tables)

This paper contains 18 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1:
  • Figure 2:
  • Figure 4:
  • Figure 5:
  • Figure 7: Network structure for LEC-HPE. The architecture includes a branch for clustering of feature space embeddings and one multi-loss branch for each predicted Euler angle ($\hat{\theta}\in \{yaw, pitch, roll\}$), to ensure continuous feature learning and avoid distortion of the latent space.
  • ...and 1 more figures