Table of Contents
Fetching ...

Human Orientation Estimation under Partial Observation

Jieting Zhao, Hanjing Ye, Yu Zhan, Hao Luan, Hong Zhang

TL;DR

This work tackles human orientation estimation from monocular images under partial observation, a critical capability for reliable Robot Person Following (RPF). It introduces Part-HOE, a transformer-based architecture that leverages a 23-joint human representation (including feet) and auxiliary joint detection to robustly estimate yaw from visible cues, plus a confidence predictor learned through a self-supervised adversarial objective. The model discretizes orientation into a $72$-class distribution with a circular Gaussian interpretation and yields a confidence score to filter unreliable predictions. Across three datasets and real robot experiments, Part-HOE achieves state-of-the-art HOE accuracy under partial observation, reduces computational cost relative to baselines, and improves the robustness and consistency of RPF tasks. The approach offers practical impact for autonomous agents operating in the wild, where occlusions and partial views are common.

Abstract

Reliable Human Orientation Estimation (HOE) from a monocular image is critical for autonomous agents to understand human intention. Significant progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high confidence. To solve the above problems, this study first develops a method called Part-HOE that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it shows great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the Robot Person Following (RPF) task.

Human Orientation Estimation under Partial Observation

TL;DR

This work tackles human orientation estimation from monocular images under partial observation, a critical capability for reliable Robot Person Following (RPF). It introduces Part-HOE, a transformer-based architecture that leverages a 23-joint human representation (including feet) and auxiliary joint detection to robustly estimate yaw from visible cues, plus a confidence predictor learned through a self-supervised adversarial objective. The model discretizes orientation into a -class distribution with a circular Gaussian interpretation and yields a confidence score to filter unreliable predictions. Across three datasets and real robot experiments, Part-HOE achieves state-of-the-art HOE accuracy under partial observation, reduces computational cost relative to baselines, and improves the robustness and consistency of RPF tasks. The approach offers practical impact for autonomous agents operating in the wild, where occlusions and partial views are common.

Abstract

Reliable Human Orientation Estimation (HOE) from a monocular image is critical for autonomous agents to understand human intention. Significant progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high confidence. To solve the above problems, this study first develops a method called Part-HOE that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it shows great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the Robot Person Following (RPF) task.
Paper Structure (22 sections, 8 equations, 7 figures, 2 tables)

This paper contains 22 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Framework Overview: An example of partial observation in robot person following. The existing methods for Human Orientation Estimation (HOE) struggle in this scenario. We propose an occlusion-robust method Part-HOE utilizing visible joints to help target state estimation and to improve RPF performance.
  • Figure 2: Our Part-HOE method takes an RGB image as input and extracts features through the ViT backbone. Then, three decoder modules output orientation estimation and confidence estimation, along with 2D human joint detection. Finally, the network is learned by multi-task training.
  • Figure 3: An explanation of Circular Gaussian Probability in the interpolation operation of Part-HOE.
  • Figure 4: Precision-recall curve under partial observation. The dashed lines indicate the max recall at 100% precision.
  • Figure 5: Comparison of different RPF methods in real robot experiments. This comparison shows the absolute trajectory error (ATE) for different RPF methods, with the green box representing our method, the blue box representing MEBOW, and the red box representing the traditional RPF method that relies solely on human velocity for orientation. The lines above and below the dashed lines indicate the maximum and minimum values, while the red line represents the mean. In the two person-following scenarios, which include (a) Backward RPF task evaluation and (b) Forward RPF task evaluation, using PartHOE for orientation estimation demonstrates the best performance in person-following.
  • ...and 2 more figures