Table of Contents
Fetching ...

Mirror-Aware Neural Humans

Daniel Ajisafe, James Tang, Shih-Yang Su, Bastian Wandt, Helge Rhodin

TL;DR

This work tackles the challenge of recovering detailed 3D human pose, shape, and appearance from monocular video by leveraging a mirror as a second synchronized view. It introduces Mirror-Aware Neural Humans, a three-stage pipeline that automatically calibrates the camera and mirror, lifts 2D keypoints to a 3D skeleton, and refines a neural radiance field with a layered, occlusion-aware mirror representation. The key contributions are a robust mirror-calibration method, a bone-oriented skeleton representation, and a Layered A-NeRF framework that handles mirror occlusions, yielding improved pose accuracy and sharper appearance over prior mirror-free and single-view methods. The approach enables consumer-level, camera-and-mirror-based 3D motion capture with practical implications for low-cost rehabilitation and related applications.

Abstract

Human motion capture either requires multi-camera systems or is unreliable when using single-view input due to depth ambiguities. Meanwhile, mirrors are readily available in urban environments and form an affordable alternative by recording two views with only a single camera. However, the mirror setting poses the additional challenge of handling occlusions of real and mirror image. Going beyond existing mirror approaches for 3D human pose estimation, we utilize mirrors for learning a complete body model, including shape and dense appearance. Our main contributions are extending articulated neural radiance fields to include a notion of a mirror, making it sample-efficient over potential occlusion regions. Together, our contributions realize a consumer-level 3D motion capture system that starts from off-the-shelf 2D poses by automatically calibrating the camera, estimating mirror orientation, and subsequently lifting 2D keypoint detections to 3D skeleton pose that is used to condition the mirror-aware NeRF. We empirically demonstrate the benefit of learning a body model and accounting for occlusion in challenging mirror scenes.

Mirror-Aware Neural Humans

TL;DR

This work tackles the challenge of recovering detailed 3D human pose, shape, and appearance from monocular video by leveraging a mirror as a second synchronized view. It introduces Mirror-Aware Neural Humans, a three-stage pipeline that automatically calibrates the camera and mirror, lifts 2D keypoints to a 3D skeleton, and refines a neural radiance field with a layered, occlusion-aware mirror representation. The key contributions are a robust mirror-calibration method, a bone-oriented skeleton representation, and a Layered A-NeRF framework that handles mirror occlusions, yielding improved pose accuracy and sharper appearance over prior mirror-free and single-view methods. The approach enables consumer-level, camera-and-mirror-based 3D motion capture with practical implications for low-cost rehabilitation and related applications.

Abstract

Human motion capture either requires multi-camera systems or is unreliable when using single-view input due to depth ambiguities. Meanwhile, mirrors are readily available in urban environments and form an affordable alternative by recording two views with only a single camera. However, the mirror setting poses the additional challenge of handling occlusions of real and mirror image. Going beyond existing mirror approaches for 3D human pose estimation, we utilize mirrors for learning a complete body model, including shape and dense appearance. Our main contributions are extending articulated neural radiance fields to include a notion of a mirror, making it sample-efficient over potential occlusion regions. Together, our contributions realize a consumer-level 3D motion capture system that starts from off-the-shelf 2D poses by automatically calibrating the camera, estimating mirror orientation, and subsequently lifting 2D keypoint detections to 3D skeleton pose that is used to condition the mirror-aware NeRF. We empirically demonstrate the benefit of learning a body model and accounting for occlusion in challenging mirror scenes.
Paper Structure (25 sections, 9 equations, 8 figures, 4 tables)

This paper contains 25 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Pose refinement and image quality. Given an image with mirror (top left) our mirror-based method reconstructs 3D pose and shape that is more accurate than the baselines (A-NeRF su2021nerf and DANBO su2022danbo) not supporting the mirror, both in terms of the 3D pose metric PA-MPJPE (top row, e.g., corrected arms), and in image quality PSNR (bottom row, e.g., reconstructed earphone and left elbow).
  • Figure 2: Models for the mirror reflection. In the first case, the rays go from the real camera $\mathbf{c}$ up to the mirror plane $\pi$ intersecting at location $\mathbf{s}$, then to the real person $\mathbf{p}$ after a mirror reflection. In the second case, the real person $\mathbf{p}$ is viewed from a virtual camera $\bar{\mathbf{c}}$ forming a virtual image. In the third case, the person location is mirrored to $\bar{\mathbf{p}}$ and light rays go straight from camera $\mathbf{c}$ to $\bar{\mathbf{p}}$.
  • Figure 3: We start from a mirror image with an unknown mirror geometry. With only 2D detections and suitable assumptions, we reconstruct the mirror plane, ground plane, and 3D keypoints in Step 1 and Step 2. Our optimization yields bone orientation that is crucial for integrating NeRF with the mirror-based reconstruction. The final Mirror-aware Neural Human is learned via layered composition of mirror and real images in Step 3 and yields improved body pose, shape, and appearance quality.
  • Figure 4: Real and mirror pose assignment. Our algorithm distinguishes the real from the virtual person using pelvis-to-neck distance. With the right assignment and correct flipping, cases of collapsed poses (left) are corrected (right).
  • Figure 5: 3D pose initialization. by measuring the error between initial re-projections (lines forming skeleton) and 2D detections (dots) to determine the optimal starting pose.
  • ...and 3 more figures