Table of Contents
Fetching ...

FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data

Oliver Boyne, Gwangbin Bae, James Charles, Roberto Cipolla

TL;DR

This work seeks to develop a method for few-view reconstruction, for the case of the human foot, and shows that the normal predictor outperforms all off-the-shelf equivalents significantly on real images, and the optimization scheme outperforms state-of-the -art photogrammetry pipelines, especially for a few- view setting.

Abstract

Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50,000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry. We show that our normal predictor outperforms all off-the-shelf equivalents significantly on real images, and our optimization scheme outperforms state-of-the-art photogrammetry pipelines, especially for a few-view setting. We release our synthetic dataset and baseline 3D scans to the research community.

FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data

TL;DR

This work seeks to develop a method for few-view reconstruction, for the case of the human foot, and shows that the normal predictor outperforms all off-the-shelf equivalents significantly on real images, and the optimization scheme outperforms state-of-the -art photogrammetry pipelines, especially for a few- view setting.

Abstract

Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50,000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry. We show that our normal predictor outperforms all off-the-shelf equivalents significantly on real images, and our optimization scheme outperforms state-of-the-art photogrammetry pipelines, especially for a few-view setting. We release our synthetic dataset and baseline 3D scans to the research community.
Paper Structure (59 sections, 5 equations, 15 figures, 3 tables)

This paper contains 59 sections, 5 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Method overview: (a) we use Blender blender to synthetically render foot images, masks, surface normals, and keypoints; (b) we train a normal predictor on this data; (c) we predict normals on real images and optimize in a multi-view, calibrated setup to reconstruct the foot, evaluating on a ground truth scan.
  • Figure 2: Samples from SynFoot, our synthetic dataset. We show (a) RGB, (b) silhouettes, (c) surface normals, and (d) keypoints. Further examples included in the supplementary.
  • Figure 3: 12 keypoints defined for our dataset on two synthetic images - detailed keypoint definitions included in the supplementary.
  • Figure 4: Here, we show our method optimizing to input images of a real foot, visualizing for three views: (a) RGB, (b) silhouette, (c) normals, and (d) keypoints, for (i) real input image, (ii) fitted FIND model, and (iii) the error. Note that the RGB for FIND uses FIND's default texture. To view the reconstruction quality compared to COLMAP and the GT scan, see Figure \ref{['fig:reconstr']}.
  • Figure 5: An example of a scan from our dataset. We show a sample of the calibrated images; the ground truth scan in grey; and the COLMAP reconstructed dense point cloud in green.
  • ...and 10 more figures