Table of Contents
Fetching ...

OAHuman: Occlusion-Aware 3D Human Reconstruction from Monocular Images

Yuanwang Yang, Hongliang Liu, Muxin Zhang, Nan Ma, Jingyu Yang, Yu-Kun Lai, Kun Li

Abstract

Monocular 3D human reconstruction in real-world scenarios remains highly challenging due to frequent occlusions from surrounding objects, people, or image truncation. Such occlusions lead to missing geometry and unreliable appearance cues, severely degrading the completeness and realism of reconstructed human models. Although recent neural implicit methods achieve impressive results on clean inputs, they struggle under occlusion due to entangled modeling of shape and texture. In this paper, we propose OAHuman, an occlusion-aware framework that explicitly decouples geometry reconstruction and texture synthesis for robust 3D human modeling from a single RGB image. The core innovation lies in the decoupling-perception paradigm, which addresses the fundamental issue of geometry-texture cross-contamination in occluded regions. Our framework ensures that geometry reconstruction is perceptually reinforced even in occluded areas, isolating it from texture interference. In parallel, texture synthesis is learned exclusively from visible regions, preventing texture errors from being transferred to the occluded areas. This decoupling approach enables OAHuman to achieve robust and high-fidelity reconstruction under occlusion, which has been a long-standing challenge in the field. Extensive experiments on occlusion-rich benchmarks demonstrate that OAHuman achieves superior performance in terms of structural completeness, surface detail, and texture realism, significantly improving monocular 3D human reconstruction under occlusion conditions.

OAHuman: Occlusion-Aware 3D Human Reconstruction from Monocular Images

Abstract

Monocular 3D human reconstruction in real-world scenarios remains highly challenging due to frequent occlusions from surrounding objects, people, or image truncation. Such occlusions lead to missing geometry and unreliable appearance cues, severely degrading the completeness and realism of reconstructed human models. Although recent neural implicit methods achieve impressive results on clean inputs, they struggle under occlusion due to entangled modeling of shape and texture. In this paper, we propose OAHuman, an occlusion-aware framework that explicitly decouples geometry reconstruction and texture synthesis for robust 3D human modeling from a single RGB image. The core innovation lies in the decoupling-perception paradigm, which addresses the fundamental issue of geometry-texture cross-contamination in occluded regions. Our framework ensures that geometry reconstruction is perceptually reinforced even in occluded areas, isolating it from texture interference. In parallel, texture synthesis is learned exclusively from visible regions, preventing texture errors from being transferred to the occluded areas. This decoupling approach enables OAHuman to achieve robust and high-fidelity reconstruction under occlusion, which has been a long-standing challenge in the field. Extensive experiments on occlusion-rich benchmarks demonstrate that OAHuman achieves superior performance in terms of structural completeness, surface detail, and texture realism, significantly improving monocular 3D human reconstruction under occlusion conditions.
Paper Structure (21 sections, 13 equations, 13 figures, 4 tables)

This paper contains 21 sections, 13 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: OAHuman enables robust 3D human reconstruction under diverse occlusions. Our method successfully reconstructs complete geometry and realistic textures from monocular images across challenging occlusion types, including human-induced (left), object-induced (middle), and in-the-wild scenarios (right). Input highlights are added for visualization to indicate the target subject.
  • Figure 2: Overview of OAHuman. Our method consists of a two-stage pipeline for occlusion-aware 3D human reconstruction. The first stage performs geometry restoration using a visibility-guided coarse completion (VGCC) module with feature-level supervisor regularization (FLSR) and dual-view normal refinement (DVNR). The second stage synthesizes semantically consistent textures via a geometry-faithful diffusion renderer, enabling high-fidelity reconstruction under occlusion.
  • Figure 3: Feature-level supervisor regularization. We use a supervisor network trained on unoccluded data to provide hierarchical feature guidance for geometry completion. We achieve emphasis on global structure and suppression of low-level noise through depth-aware feature weighting, and enhance supervision on occluded body regions through a visibility-aware masking strategy.
  • Figure 4: Qualitative comparison of geometry reconstruction results.
  • Figure 5: Qualitative comparison of novel view synthesis results.
  • ...and 8 more figures