Table of Contents
Fetching ...

Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu

TL;DR

This work tackles reconstructing 3D human body meshes from monocular images with substantial occlusion, where traditional top-down SMPL-based methods struggle. It introduces Divide and Fuse (D&F), a bottom-up framework built on Human Part Parametric Models ($ ext{HPPM}$) that splits the body into $15$ parts and reconstructs each part independently from a few shape and global transformation parameters, using a Swin Transformer backbone. A fusion module with overlapping regions and self-supervised losses ($\mathcal{L}_{ol}$, $\mathcal{L}_{dc}$) then seamlessly connects adjacent parts to form a coherent mesh, even when only a subset of parts is visible. The authors provide two partially visible benchmarks, PV-Human3.6M and PV-3DPW, and demonstrate that D&F yields superior mesh and joint accuracy compared to state-of-the-art methods under heavy occlusion, with ablations confirming the importance of part-wise supervision, overlapping handling, and gradual fusion. Overall, D&F offers a robust, modular approach to partially visible human reconstruction that improves reliability in occluded scenarios and motivates further extensions to richer body models and automatic part detection.

Abstract

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.

Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

TL;DR

This work tackles reconstructing 3D human body meshes from monocular images with substantial occlusion, where traditional top-down SMPL-based methods struggle. It introduces Divide and Fuse (D&F), a bottom-up framework built on Human Part Parametric Models () that splits the body into parts and reconstructs each part independently from a few shape and global transformation parameters, using a Swin Transformer backbone. A fusion module with overlapping regions and self-supervised losses (, ) then seamlessly connects adjacent parts to form a coherent mesh, even when only a subset of parts is visible. The authors provide two partially visible benchmarks, PV-Human3.6M and PV-3DPW, and demonstrate that D&F yields superior mesh and joint accuracy compared to state-of-the-art methods under heavy occlusion, with ablations confirming the importance of part-wise supervision, overlapping handling, and gradual fusion. Overall, D&F offers a robust, modular approach to partially visible human reconstruction that improves reliability in occluded scenarios and motivates further extensions to richer body models and automatic part detection.

Abstract

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.
Paper Structure (17 sections, 21 equations, 7 figures, 6 tables)

This paper contains 17 sections, 21 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Traditional top-down method vs. Divide and Fuse. (a) When the input image only shows a few body parts (1 column), top-down SMPL-based methods may easily fail (2 column) due to the lack of whole-body information. Our part-based D&F method is designed for partially visible human reconstruction (see results in the 3 column). (b) Primary framework of SMPL-based prior art versus our proposed model.
  • Figure 2: Our Divide and Fuse (D&F) method takes a monocular partially visible human image as input and generates the human mesh of visible parts. The input image first goes through a backbone and an MLP network to get the parameters of HPPM. Then, these parameters are used to generate part meshes through each part-specific HPPM. Finally, a fusion module connects adjacent visible parts. Details are provided in \ref{['sec:method']}.
  • Figure 3: (a) HPPM template segmentation. We segment the SMPL template to generate HPPM templates. The joint areas are covered by both adjacent parts (overlap). This design allows HPPM to naturally cover near-joint distortions using shape parameters, while also facilitating the fusion of parts together. (b) HPPM training process. We segment part ground truths from Human3.6M human36m, 3DPW pw3d, and AMASS AMASS:2019. For each part, we use a dimension-reduction strategy to train a matrix that maps the high-dimensional part meshes into a few shape parameters. Shape parameters are estimated by the network to recover part meshes.
  • Figure 4: HPPM training error of each part changes with the number of shape parameters used. We consider an adjustable number of parameters for each part. We set the maximum joint and vertex training errors to be 2mm, and a minimum number of parameters to 16. Left: vertex training error. Right: joint training error.
  • Figure 5: Visual ablation on gradual part connecting. When this module is removed, the connection points between two adjacent parts become misaligned, as indicated by the red arrow. This alignment issue is resolved using the gradual part connecting.
  • ...and 2 more figures