RCR: Robust Crowd Reconstruction with Upright Space from a Single Large-scene Image

Jing Huang; Hao Wen; Tianyi Zhou; Haozhe Lin; Yu-kun Lai; Kun Li

RCR: Robust Crowd Reconstruction with Upright Space from a Single Large-scene Image

Jing Huang, Hao Wen, Tianyi Zhou, Haozhe Lin, Yu-kun Lai, Kun Li

TL;DR

This work tackles monocular crowd reconstruction in large-scene images with unknown camera parameters and arbitrary FoVs. It introduces the Human-scene Virtual Interaction Point (HVIP) to resolve depth ambiguity and a canonical Upright 2D/3D Space with Upright Normalization to decouple camera effects from reconstruction, complemented by Iterative Ground-aware Cropping to handle multiple scales. The proposed Robust Crowd Reconstruction (RCR) achieves globally consistent reconstructions in unified camera space without test-time optimization and is supported by two new datasets, LargeCrowd and SynCrowd. Experimental results demonstrate improved reprojection accuracy and spatial consistency, with the code and data to be released for research use.

Abstract

This paper focuses on spatially consistent hundreds of human pose and shape reconstruction from a single large-scene image with various human scales under arbitrary camera FoVs (Fields of View). Due to the small and highly varying 2D human scales, depth ambiguity, and perspective distortion, no existing methods can achieve globally consistent reconstruction with correct reprojection. To address these challenges, we first propose a new concept, Human-scene Virtual Interaction Point (HVIP), to convert the complex 3D human localization into 2D-pixel localization. We then extend it to RCR (Robust Crowd Reconstruction), which achieves globally consistent reconstruction and stable generalization on different camera FoVs without test-time optimization. To perceive humans in varying pixel sizes, we propose an Iterative Ground-aware Cropping to automatically crop the image and then merge the results. To eliminate the influence of the camera and cropping process during the reconstruction, we introduce a canonical Upright 3D Space and the corresponding Upright 2D Space. To link the canonical space and the camera space, we propose the Upright Normalization, which transforms the local crop input into the Upright 2D Space, and transforms the output from the Upright 3D Space into the unified camera space. Besides, we contribute two benchmark datasets, LargeCrowd and SynCrowd, for evaluating crowd reconstruction in large scenes. Experimental results demonstrate the effectiveness of the proposed method. The source code and data will be publicly available for research purposes.

RCR: Robust Crowd Reconstruction with Upright Space from a Single Large-scene Image

TL;DR

Abstract

RCR: Robust Crowd Reconstruction with Upright Space from a Single Large-scene Image

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)