NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images
Jingrui Yu, Dipankar Nandi, Roman Seidel, Gangolf Hirtz
TL;DR
This work tackles the lack of large-scale top-view fisheye datasets for human pose estimation by introducing NToP, a NeRF-powered pipeline that converts existing 2D/3D datasets into semi-synthetic, top-view data with groundtruth 2D and 3D keypoints. The authors render over 570K images (NToP570K) using virtual fisheye cameras and provide OmniLab, a real-world top-view dataset, to validate cross-domain performance. Finetuning ViTPose-B on NToP-train boosts 2D AP by 33.3% on the NToP validation set, while HybrIK-Transformer finetuned on NToP-train achieves a substantial PA-MPJPE reduction of 53.7 mm for 3D HPE, demonstrating strong cross-domain gains and the utility of semi-synthetic data. The results indicate that NToP improves top-view HPE performance and reduces domain gaps relative to existing datasets, with potential extensions to multi-view, temporal, and more efficient NeRF-based rendering.
Abstract
Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP570K (NeRF-powered Top-view human Pose dataset for fisheye cameras with over 570 thousand images), and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. A pretrained ViTPose-B model achieves an improvement in AP of 33.3 % on our validation set for 2D HPE after finetuning on our training set. A similarly finetuned HybrIK-Transformer model gains 53.7 mm reduction in PA-MPJPE for 3D HPE on the validation set.
