On the representation and methodology for wide and short range head pose estimation

Alejandro Cobo; Roberto Valle; José M. Buenaposada; Luis Baumela

On the representation and methodology for wide and short range head pose estimation

Alejandro Cobo, Roberto Valle, José M. Buenaposada, Luis Baumela

TL;DR

This work analyzes head pose estimation across short-range and wide-range rotations, clarifying the roles of orientation representations and distance metrics. It argues that Euler angles are effective for SRHP, but not suitable as a distance metric due to gimbal lock, and advocates continuous representations (6D) for WRHP along with the geodesic distance $g_{GE}$. The authors introduce the Opal loss, which generalizes $g_{GE}$ to control sample contributions, and a cross-dataset alignment procedure to address reference-system misalignment, achieving new SOTA on the 300W-LP/Biwi cross-dataset setup. They also establish a Panoptic-based WRHP benchmark, demonstrating that 6D with Opal yields strong WRHP performance and highlighting the impact of representation choice on cross-data evaluations and real-world deployments.

Abstract

Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360° rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodology for short- and wide-range HPE and discuss which representations and metrics are adequate for each case. We show that the popular Euler angles representation is a good choice for short-range HPE, but not at extreme rotations. However, the Euler angles' gimbal lock problem prevents them from being used as a valid metric in any setting. We also revisit the current cross-data set evaluation methodology and note that the lack of alignment between the reference systems of the training and test data sets negatively biases the results of all articles in the literature. We introduce a procedure to quantify this misalignment and a new methodology for cross-data set HPE that establishes new, more accurate, SOTA for the 300W-LP|Biwi benchmark. We also propose a generalization of the geodesic angular distance metric that enables the construction of a loss that controls the contribution of each training sample to the optimization of the model. Finally, we introduce a wide range HPE benchmark based on the CMU Panoptic data set.

On the representation and methodology for wide and short range head pose estimation

TL;DR

. The authors introduce the Opal loss, which generalizes

to control sample contributions, and a cross-dataset alignment procedure to address reference-system misalignment, achieving new SOTA on the 300W-LP/Biwi cross-dataset setup. They also establish a Panoptic-based WRHP benchmark, demonstrating that 6D with Opal yields strong WRHP performance and highlighting the impact of representation choice on cross-data evaluations and real-world deployments.

Abstract

Paper Structure (16 sections, 9 equations, 11 figures, 5 tables)

This paper contains 16 sections, 9 equations, 11 figures, 5 tables.

Introduction
Related work
Head pose representation
Loss functions and metrics for HPE
Representation and methodology for HPE
The gimbal lock
Discontinuity
Reference systems alignment for cross-data set evaluation
Opal loss function for HPE
Experiments
Data sets
Implementation details
Synthetic experiments
SRHP results
WRHP results
...and 1 more sections

Figures (11)

Figure 1: Applications involving SRHP and WRHP configurations. Images from 300W-LP Zhu19b, WIDER Face Yang16b, Biwi Fanelli13 and CMU Panoptic Joo17 data sets.
Figure 2: Concept diagram of our analysis. Given an RGB image containing a cropped face, the HPE estimation algorithm produces a pose representation, ${\hbox{\boldmath $\bf p$}}$. Independently of the internal representation used by the model, predictions can be converted to Euler/quaternion angles or rotation matrices and measure the estimation error using different metrics.
Figure 3: WRHP means estimating the rotation matrix ${\hbox{\boldmath $\tt R$}}$ to align the reference frame of the head with that of the camera. We show some WRHP results projecting a 3D axis onto the image plane coordinates. The text below represent the [yaw, pitch, roll] angles.
Figure 4: All faces have visually very similar configuration but MAE is very large due to gimbal lock. However, the geodesic distance is coherent. Color code: [ yaw, pitch, roll].
Figure 5: Discontinuity in quaternions under a rotation around the yaw axis (from $0^\circ$ to $360^\circ$). Component $q_y$ shows an abrupt change from -1 to +1 when the yaw reaches $180^\circ$.
...and 6 more figures

On the representation and methodology for wide and short range head pose estimation

TL;DR

Abstract

On the representation and methodology for wide and short range head pose estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)