Active Human Pose Estimation via an Autonomous UAV Agent
Jingxi Chen, Botao He, Chahat Deep Singh, Cornelia Fermuller, Yiannis Aloimonos
TL;DR
The paper tackles occlusion-driven challenges in 2D human pose estimation from UAV videos by proposing an integrated, autonomous system. It combines NeRF-based drone-view data generation, an on-board PoseErrNet that estimates a 3D perception guidance field from 2D pose observations, and a perception-aware planner that fuses this guidance with UAV dynamics to select feasible camera viewpoints. Key contributions include a drone-view data generation framework, an efficient on-board network for next-view estimation, and a combined planner that ensures perception quality while respecting safety constraints. The approach demonstrates improved pose estimation accuracy and safe navigation in both simulated and real-world scenarios, with potential impact on aerial cinematography and surveillance tasks.
Abstract
One of the core activities of an active observer involves moving to secure a "better" view of the scene, where the definition of "better" is task-dependent. This paper focuses on the task of human pose estimation from videos capturing a person's activity. Self-occlusions within the scene can complicate or even prevent accurate human pose estimation. To address this, relocating the camera to a new vantage point is necessary to clarify the view, thereby improving 2D human pose estimation. This paper formalizes the process of achieving an improved viewpoint. Our proposed solution to this challenge comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner for devising a feasible motion plan to reposition the camera based on the predicted errors for camera views. The Data Generation Framework utilizes NeRF-based methods to generate a comprehensive dataset of human poses and activities, enhancing the drone's adaptability in various scenarios. The Camera View Error Estimation Network is designed to evaluate the current human pose and identify the most promising next viewing angles for the drone, ensuring a reliable and precise pose estimation from those angles. Finally, the combined planner incorporates these angles while considering the drone's physical and environmental limitations, employing efficient algorithms to navigate safe and effective flight paths. This system represents a significant advancement in active 2D human pose estimation for an autonomous UAV agent, offering substantial potential for applications in aerial cinematography by improving the performance of autonomous human pose estimation and maintaining the operational safety and efficiency of UAVs.
