Table of Contents
Fetching ...

MeshPose: Unifying DensePose and 3D Body Mesh reconstruction

Eric-Tuan Lê, Antonis Kakolyris, Petros Koutras, Himmy Tam, Efstratios Skordos, George Papandreou, Rıza Alp Güler, Iasonas Kokkinos

TL;DR

The MeshPose system is trained in an end -to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.

Abstract

DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision to accurately localize in 2D a subset of the mesh vertices ('VertexPose'). We then lift these vertices to 3D, yielding a low-poly body mesh ('MeshPose'). Our system is trained in an end-to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.

MeshPose: Unifying DensePose and 3D Body Mesh reconstruction

TL;DR

The MeshPose system is trained in an end -to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.

Abstract

DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision to accurately localize in 2D a subset of the mesh vertices ('VertexPose'). We then lift these vertices to 3D, yielding a low-poly body mesh ('MeshPose'). Our system is trained in an end-to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.
Paper Structure (36 sections, 14 equations, 12 figures, 6 tables)

This paper contains 36 sections, 14 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: DensePose prediction systems are pixel-accurate but do not provide a 3D mesh, while human mesh recovery systems do not provide pixel-accurate 2D reprojection. We propose MeshPose, a novel human mesh recovery method that combines the benefits of both.
  • Figure 2: Left: Inference Time vs DensePose AP, Right: PA-MPJPE vs DensePose AP -- for both, top-left is best and radii are proportional to the sizes of the models (MB). Our approach outperforms HMR methods on DensePose metrics by more than 50% while having close to state of the art 3D accuracy. By combining the highest FPS rate and small model size with state-of-art reprojection accuracy, our pipeline is well suited for mobile inference.
  • Figure 3: Meshpose Architecture: The lower VertexPose branch extracts multiple heatmaps from which, by applying the spatial argsoftmax operation, it computes precise $x$ and $y$ coordinates for all the vertices inside the input crop. The upper Regression branch computes the coordinates ($x$, $y$, and vertex depth $z$) for all vertices, along with their visibility scores $w$. The score $w$ will take lower values when the corresponding vertex is either occluded or fall outside the crop area. We differentiably combine the VertexPose and regressed coordinates via $w$ to get the final 3D mesh. We densely supervise the intermediate per-vertex heatmaps and the final output with UV, mesh and silhouette cues to end up with a low latency, image aligned, in-the-wild HMR system.
  • Figure 4: Geometry-driven losses used to supervise VertexPose with DensePose ground-truth. Our barycentric loss requires that the per-pixel distribution over VertexPose matches the UV annotation's barycentrics. Our UV consistency loss requires that the UV annotation's barycentrics at a labelled pixel $\mathbf{x}$ should recover $\mathbf{x}$ based on a similar combination of VertexPose vertices into $\hat{\mathbf{x}}$.
  • Figure 5: Qualitative comparison on COCO against 4 state-of-the-art mesh reconstruction systems. MeshPose is robust to severe occlusions, partial body cropping and body shapes.
  • ...and 7 more figures