AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Xiaozhen Qiao; Wenjia Wang; Zhiyuan Zhao; Jiacheng Sun; Ping Luo; Hongyuan Zhang; Xuelong Li

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Xiaozhen Qiao, Wenjia Wang, Zhiyuan Zhao, Jiacheng Sun, Ping Luo, Hongyuan Zhang, Xuelong Li

TL;DR

AHAP achieves competitive performance on both world-space human reconstruction and camera pose estimation, while being 180$\times$ faster than optimization-based approaches.

Abstract

Reconstructing 3D humans from images captured at multiple perspectives typically requires pre-calibration, like using checkerboards or MVS algorithms, which limits scalability and applicability in diverse real-world scenarios. In this work, we present \textbf{AHAP} (Reconstructing \textbf{A}rbitrary \textbf{H}umans from \textbf{A}rbitrary \textbf{P}erspectives), a feed-forward framework for reconstructing arbitrary humans from arbitrary camera perspectives without requiring camera calibration. Our core lies in the effective fusion of multi-view geometry to assist human association, reconstruction and localization. Specifically, we use a Cross-View Identity Association module through learnable person queries and soft assignment, supervised by contrastive learning to resolve cross-view human identity association. A Human Head fuses cross-view features and scene context for SMPL prediction, guided by cross-view reprojection losses to enforce body pose consistency. Additionally, multi-view geometry eliminates the depth ambiguity inherent in monocular methods, providing more precise 3D human localization through multi-view triangulation. Experiments on EgoHumans and EgoExo4D demonstrate that AHAP achieves competitive performance on both world-space human reconstruction and camera pose estimation, while being 180$\times$ faster than optimization-based approaches.

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

TL;DR

AHAP achieves competitive performance on both world-space human reconstruction and camera pose estimation, while being 180

faster than optimization-based approaches.

Abstract

faster than optimization-based approaches.

Paper Structure (35 sections, 20 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 35 sections, 20 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Method
Overview
Visual Feature Encoding
Cross-View Identity Association
Arbitrary View Human Head
Training Details
Human-Scene Reconstruction
Experiments
Datasets and Metrics
Results
Human Mesh Recovery.
Camera Pose Estimation.
Efficiency Analysis.
...and 20 more sections

Figures (9)

Figure 1: (a) AHAP achieves 180$\times$ speedup over optimization-based HSfM muller2025reconstructing while maintaining competitive accuracy. (b) Results on EgoHumans. (c) Results on EgoExo4D.
Figure 2: Overall pipeline of AHAP. Given multi-view images, the scene encoder lin2025depth estimates scene geometry and camera poses, while the human encoder baradel2024multi extracts human-centric features. Our cross-view identity association module matches the same person across views via learnable queries. The human head fuses scene tokens, aggregated tokens, and reference view tokens through a multi-view fusion decoder to predict SMPL parameters. Finally, we align humans and scene point clouds via scale alignment and multi-view triangulation for precise human localization.
Figure 3: PCA visualization of feature distributions. (a-d) Scene encoder lin2025depth features; (e-h) Human encoder baradel2024multi features. Human encoder features show stronger semantic clustering aligned with ground truth, while scene encoder provides complementary geometric information.
Figure 4: Multi-view triangulation for human position refinement. For persons visible in multiple views, we refine their 3D positions using DLT triangulation based on 2D pelvis observations and estimated camera poses, improving human-scene alignment.
Figure 5: Qualitative results. Visualization of human-scene reconstruction on EgoHumans and EgoExo4D. AHAP produces accurate human meshes within reconstructed scenes, maintaining consistent identity association across views.
...and 4 more figures

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

TL;DR

Abstract

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Authors

TL;DR

Abstract

Table of Contents

Figures (9)