SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
TL;DR
The paper presents SimpleDepthPose, a fast, training-free RGBD-based approach for multi-view, multi-person 3D pose estimation. It predicts 2D joints from RGB frames, computes per-joint depths from aligned depth images using a cross-shaped neighborhood with per-joint offsets, and transforms joints to world coordinates. 3D pose proposals are tracked across frames and merged by filtering outliers and averaging the top-k proposals, enabling robust multi-view fusion without neural refinement. Evaluations on MVOR and Panoptic show strong generalization and high detection rates with impressive speed, highlighting depth data as a key enabler for robustness in occluded scenes. The method’s simplicity and speed, along with public code, make it a practical option for real-time, multi-person 3D pose estimation where depth information is available.
Abstract
In the rapidly advancing domain of computer vision, accurately estimating the poses of multiple individuals from various viewpoints remains a significant challenge, especially when reliability is a key requirement. This paper introduces a novel algorithm that excels in multi-view, multi-person pose estimation by incorporating depth information. An extensive evaluation demonstrates that the proposed algorithm not only generalizes well to unseen datasets, and shows a fast runtime performance, but also is adaptable to different keypoints. To support further research, all of the work is publicly accessible.
