Table of Contents
Fetching ...

Monocular Person Localization under Camera Ego-motion

Yu Zhan, Hanjing Ye, Hong Zhang

TL;DR

This work tackles monocular person localization when the host robot induces significant camera ego-motion. It introduces an optimization-based framework that represents a person as four collinear points and jointly estimates the camera attitude and the person’s 3D footprint by minimizing a weighted reprojection error on a normalized image plane, solved via nonlinear least squares with a robust loss. The method is integrated into a Robot Person Following system for real-time execution on a quadruped robot and is evaluated on public datasets plus a new RPF-Quadruped dataset, showing improvements over geometric-model and learning-based baselines under challenging ego-motion. The approach reduces reliance on odometry, enabling robust, frame-wise localization that supports stable long-term following in rough terrain, with a public dataset release to facilitate further research.

Abstract

Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to severe camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments demonstrate our method outperforms baselines in person localization accuracy. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.

Monocular Person Localization under Camera Ego-motion

TL;DR

This work tackles monocular person localization when the host robot induces significant camera ego-motion. It introduces an optimization-based framework that represents a person as four collinear points and jointly estimates the camera attitude and the person’s 3D footprint by minimizing a weighted reprojection error on a normalized image plane, solved via nonlinear least squares with a robust loss. The method is integrated into a Robot Person Following system for real-time execution on a quadruped robot and is evaluated on public datasets plus a new RPF-Quadruped dataset, showing improvements over geometric-model and learning-based baselines under challenging ego-motion. The approach reduces reliance on odometry, enabling robust, frame-wise localization that supports stable long-term following in rough terrain, with a public dataset release to facilitate further research.

Abstract

Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to severe camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments demonstrate our method outperforms baselines in person localization accuracy. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.

Paper Structure

This paper contains 17 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A scenario of a quadruped robot following a person through a rugged lawn. The robot view is from an onboard panoramic camera (see Sec. \ref{['platform-text']}). The robot's dynamic motion induces severe camera ego-motion and vibration, which bring challenges for person localization.
  • Figure 2: The geometry of our observation model. (a) In the raw camera-centric view, the person appears tilted due to the robot's ego-motion. (b) Our model assumes an upright person, representing the ego-motion as a corresponding tilt of the camera.
  • Figure 3: Our proposed framework for monocular Robot Person Following (RPF). The modules highlighted in orange represent our key contributions: (1) a normalization step for camera-agnostic processing, and (2) a subsequent optimization-based person localization method.
  • Figure 4: (a) Our quadruped robot platform. (b-d) Scenarios from our RPF-Quadruped dataset.
  • Figure 5: Screenshots of our method running on different datasets: (a) KITTIkitti, (b) FieldSAFEfieldsafe, (c-d) fisheye and pin-hole images in Rugged Lawn. The neck point in (d) is not observable due to the person's proximity to the camera.
  • ...and 2 more figures