Table of Contents
Fetching ...

On the power of data augmentation for head pose estimation

Michael Welter

TL;DR

Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed, and the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

Abstract

Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

On the power of data augmentation for head pose estimation

TL;DR

Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed, and the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

Abstract

Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.
Paper Structure (13 sections, 4 equations, 9 figures, 5 tables)

This paper contains 13 sections, 4 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Visualization of the samples from AFLW2000-3D with the worst rotation error. Predictions are blue, ground truth is green. Shown are axes of the local coordinate system, landmarks and bounding box.
  • Figure 2: Visualization of the samples from Biwi with the worst rotation error, analogous to \ref{['fig:worst_fits']}
  • Figure 3: Plots the geodesic error of rotation predictions versus the standard deviation $\sigma$ of Gaussian noise added to input images. The evaluations are conducted over AFLW2000-3D modified by noise. The error bars show the standard error of the sample mean over the five evaluation networks.
  • Figure 4: Correlation of the uncertainty estimate with rotation errors. The data points iterate over samples from AFLW2000 and the five BL evaluation networks. Recall that the uncertainty estimate ${\bf \hat{\Sigma}}_{rot}$ is a covariance matrix. Plotted is its Frobenius norm to condense it to a single number.
  • Figure 5: The underlying mesh for rendering images with out-of-plane rotations.
  • ...and 4 more figures