Table of Contents
Fetching ...

Location-guided Head Pose Estimation for Fisheye Image

Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, Dah-Jye Lee

TL;DR

This work tackles head pose estimation from fisheye images, where radial distortion undermines rectilinear-based models. It introduces an end-to-end multi-task CNN that jointly estimates head pose and head location in polar coordinates, without image rectification or camera calibration, and trains on synthetic fisheye variants BIWI-360, 300W-LP-360, AFLW2000-360 plus a real-world fisheye dataset. By incorporating a location-guided feature extraction pathway and a dedicated location-estimation branch, the approach achieves higher accuracy than both two-stage rectification baselines and retrained one-stage methods on several fisheye benchmarks, while maintaining practical processing speed. The results demonstrate the practical value of location-aware distortion cues for robust HPE in ultra-wide FOV imagery, with potential extensions to top-view or ceiling-mounted fisheye setups via expanded datasets.

Abstract

Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experiments results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.

Location-guided Head Pose Estimation for Fisheye Image

TL;DR

This work tackles head pose estimation from fisheye images, where radial distortion undermines rectilinear-based models. It introduces an end-to-end multi-task CNN that jointly estimates head pose and head location in polar coordinates, without image rectification or camera calibration, and trains on synthetic fisheye variants BIWI-360, 300W-LP-360, AFLW2000-360 plus a real-world fisheye dataset. By incorporating a location-guided feature extraction pathway and a dedicated location-estimation branch, the approach achieves higher accuracy than both two-stage rectification baselines and retrained one-stage methods on several fisheye benchmarks, while maintaining practical processing speed. The results demonstrate the practical value of location-aware distortion cues for robust HPE in ultra-wide FOV imagery, with potential extensions to top-view or ceiling-mounted fisheye setups via expanded datasets.

Abstract

Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experiments results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.
Paper Structure (19 sections, 16 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: Examples of fisheye images obtained from (a) BIWI-360 dataset, (b) a fisheye camera (JR$^\circledR$HF900).
  • Figure 2: Average head pose estimation error in fisheye image versus normalized radial distance. The solid line is the result on the fisheye-distorted BIWI-360 dataset. The dotted line is the result on the non-distorted BIWI dataset.
  • Figure 3: The overview of the proposed network.
  • Figure 4: The location feature extraction module based on the attention mechanism in sanghyunwoo_2018_5 that includes two sequential submodules: (a) submodule of channel attention and (b) submodule of spatial attention.
  • Figure 5: The structure of the pose estimation module.
  • ...and 5 more figures