Table of Contents
Fetching ...

Mathematical Foundation and Corrections for Full Range Head Pose Estimation

Huei-Chung Hu, Xuyang Wu, Yuan Wang, Yi Fang, Hsin-Tai Wu

TL;DR

The paper addresses the core problem of ambiguous and inconsistent coordinate systems and Euler-angle conventions in full-range head pose estimation. It proposes a rigorous framework to infer rotation systems from code, convert poses between systems, derive 2D augmentation formulas for rotation matrices, and implement correct drawing routines, all while harmonizing left- vs. right-handed and intrinsic vs. extrinsic conventions. Through detailed analyses of 300W-LP, CMU Panoptic, WHENet, 3DDFA/3DDFA_v2, and 6D-RepNet family, the work reveals exact rotation definitions and demonstrates robust, low-error conversions (e.g., Frobenius- or geodesic-based), along with practical drawing methods (three-line drawing vs draw_axis) that preserve pose semantics. The findings enable consistent cross-dataset pose labeling, reliable full-range pose visualization, and improved data augmentation for HPE, which is crucial for benchmarking and building next-generation full-range head pose systems. Overall, the work provides essential mathematical clarity and practical tools to enable precise, comparable, and augmentation-friendly head pose research, with immediate impact on dataset construction, model evaluation, and reproducibility in HPE.

Abstract

Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on coordinate systems, and yaw, roll, and pitch angles are sensitive to their application order. Without precise definitions, it becomes challenging to validate the correctness of the output head pose and drawing routines employed in prior works. In this paper, we thoroughly examined the Euler angles defined in the 300W-LP dataset, head pose estimation such as 3DDFA-v2, 6D-RepNet, WHENet, etc, and the validity of their drawing routines of the Euler angles. When necessary, we infer their coordinate system and sequence of yaw, roll, pitch from provided code. This paper presents (1) code and algorithms for inferring coordinate system from provided source code, code for Euler angle application order and extracting precise rotation matrices and the Euler angles, (2) code and algorithms for converting poses from one rotation system to another, (3) novel formulae for 2D augmentations of the rotation matrices, and (4) derivations and code for the correct drawing routines for rotation matrices and poses. This paper also addresses the feasibility of defining rotations with right-handed coordinate system in Wikipedia and SciPy, which makes the Euler angle extraction much easier for full-range head pose research.

Mathematical Foundation and Corrections for Full Range Head Pose Estimation

TL;DR

The paper addresses the core problem of ambiguous and inconsistent coordinate systems and Euler-angle conventions in full-range head pose estimation. It proposes a rigorous framework to infer rotation systems from code, convert poses between systems, derive 2D augmentation formulas for rotation matrices, and implement correct drawing routines, all while harmonizing left- vs. right-handed and intrinsic vs. extrinsic conventions. Through detailed analyses of 300W-LP, CMU Panoptic, WHENet, 3DDFA/3DDFA_v2, and 6D-RepNet family, the work reveals exact rotation definitions and demonstrates robust, low-error conversions (e.g., Frobenius- or geodesic-based), along with practical drawing methods (three-line drawing vs draw_axis) that preserve pose semantics. The findings enable consistent cross-dataset pose labeling, reliable full-range pose visualization, and improved data augmentation for HPE, which is crucial for benchmarking and building next-generation full-range head pose systems. Overall, the work provides essential mathematical clarity and practical tools to enable precise, comparable, and augmentation-friendly head pose research, with immediate impact on dataset construction, model evaluation, and reproducibility in HPE.

Abstract

Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on coordinate systems, and yaw, roll, and pitch angles are sensitive to their application order. Without precise definitions, it becomes challenging to validate the correctness of the output head pose and drawing routines employed in prior works. In this paper, we thoroughly examined the Euler angles defined in the 300W-LP dataset, head pose estimation such as 3DDFA-v2, 6D-RepNet, WHENet, etc, and the validity of their drawing routines of the Euler angles. When necessary, we infer their coordinate system and sequence of yaw, roll, pitch from provided code. This paper presents (1) code and algorithms for inferring coordinate system from provided source code, code for Euler angle application order and extracting precise rotation matrices and the Euler angles, (2) code and algorithms for converting poses from one rotation system to another, (3) novel formulae for 2D augmentations of the rotation matrices, and (4) derivations and code for the correct drawing routines for rotation matrices and poses. This paper also addresses the feasibility of defining rotations with right-handed coordinate system in Wikipedia and SciPy, which makes the Euler angle extraction much easier for full-range head pose research.
Paper Structure (28 sections, 6 theorems, 39 equations, 19 figures, 1 table)

This paper contains 28 sections, 6 theorems, 39 equations, 19 figures, 1 table.

Key Result

Lemma 3.3

Suppose $\bm{e_{1}}$ and $\bm{e_{2}}$ are the orthogonal unit axis vectors defined in Definition def:extrinsic_def. Given the elemental rotation sequence, first rotating $\theta_{1}$ along $\bm{e_{1}}$, then rotating $\theta_{2}$ along $\bm{e_{2}}$, we can derive that the intrinsic rotation is $\Lam

Figures (19)

  • Figure 1: Dlib's 68 facial landmarks. DBLP:journals/vc/ElmahmudiU21
  • Figure 2: Demonstration of successive application of rotations in the case of $\theta_{1} = \theta_2 = 90\degree$
  • Figure 3: SciPy and Wikipedia's coordinate system with right-handed
  • Figure 4: 300W-LP's elemental rotations
  • Figure 5: 300W-LP's Euler angle illustration does not align with what we inferred from its source code
  • ...and 14 more figures

Theorems & Definitions (17)

  • Definition 3.1
  • Definition 3.2
  • Lemma 3.3
  • proof
  • Theorem 3.4
  • proof
  • Definition 4.1
  • Lemma 9.1
  • proof
  • Definition 11.1
  • ...and 7 more