Human Orientation Estimation under Partial Observation
Jieting Zhao, Hanjing Ye, Yu Zhan, Hao Luan, Hong Zhang
TL;DR
This work tackles human orientation estimation from monocular images under partial observation, a critical capability for reliable Robot Person Following (RPF). It introduces Part-HOE, a transformer-based architecture that leverages a 23-joint human representation (including feet) and auxiliary joint detection to robustly estimate yaw from visible cues, plus a confidence predictor learned through a self-supervised adversarial objective. The model discretizes orientation into a $72$-class distribution with a circular Gaussian interpretation and yields a confidence score to filter unreliable predictions. Across three datasets and real robot experiments, Part-HOE achieves state-of-the-art HOE accuracy under partial observation, reduces computational cost relative to baselines, and improves the robustness and consistency of RPF tasks. The approach offers practical impact for autonomous agents operating in the wild, where occlusions and partial views are common.
Abstract
Reliable Human Orientation Estimation (HOE) from a monocular image is critical for autonomous agents to understand human intention. Significant progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high confidence. To solve the above problems, this study first develops a method called Part-HOE that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it shows great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the Robot Person Following (RPF) task.
