Table of Contents
Fetching ...

FOCUS -- Multi-View Foot Reconstruction From Synthetically Trained Dense Correspondences

Oliver Boyne, Roberto Cipolla

TL;DR

FOCUS tackles foot reconstruction from multi-view RGB images by predicting dense per-pixel correspondences to the FIND-based template via Template Object Coordinates (TOCs). It introduces SynFoot2, a large synthetic dataset with articulated feet and TOCs, and trains an uncertainty-aware TOC predictor. The paper proposes two reconstruction pathways: FOCUS-SfM, an SfM-inspired fusion producing an oriented point cloud and a Poisson mesh, and FOCUS-O, an optimization-based fit of the FIND model to TOC predictions using uncertainty weighting. Results show state-of-the-art performance in few-view regimes and competitive quality with many views, with faster runtimes and CPU-friendly operation. The work enables accurate foot modeling from consumer imagery, with potential benefits for health, footwear, and orthotics applications, and releases both data and code to the research community.

Abstract

Surface reconstruction from multiple, calibrated images is a challenging task - often requiring a large number of collected images with significant overlap. We look at the specific case of human foot reconstruction. As with previous successful foot reconstruction work, we seek to extract rich per-pixel geometry cues from multi-view RGB images, and fuse these into a final 3D object. Our method, FOCUS, tackles this problem with 3 main contributions: (i) SynFoot2, an extension of an existing synthetic foot dataset to include a new data type: dense correspondence with the parameterized foot model FIND; (ii) an uncertainty-aware dense correspondence predictor trained on our synthetic dataset; (iii) two methods for reconstructing a 3D surface from dense correspondence predictions: one inspired by Structure-from-Motion, and one optimization-based using the FIND model. We show that our reconstruction achieves state-of-the-art reconstruction quality in a few-view setting, performing comparably to state-of-the-art when many views are available, and runs substantially faster. We release our synthetic dataset to the research community. Code is available at: https://github.com/OllieBoyne/FOCUS

FOCUS -- Multi-View Foot Reconstruction From Synthetically Trained Dense Correspondences

TL;DR

FOCUS tackles foot reconstruction from multi-view RGB images by predicting dense per-pixel correspondences to the FIND-based template via Template Object Coordinates (TOCs). It introduces SynFoot2, a large synthetic dataset with articulated feet and TOCs, and trains an uncertainty-aware TOC predictor. The paper proposes two reconstruction pathways: FOCUS-SfM, an SfM-inspired fusion producing an oriented point cloud and a Poisson mesh, and FOCUS-O, an optimization-based fit of the FIND model to TOC predictions using uncertainty weighting. Results show state-of-the-art performance in few-view regimes and competitive quality with many views, with faster runtimes and CPU-friendly operation. The work enables accurate foot modeling from consumer imagery, with potential benefits for health, footwear, and orthotics applications, and releases both data and code to the research community.

Abstract

Surface reconstruction from multiple, calibrated images is a challenging task - often requiring a large number of collected images with significant overlap. We look at the specific case of human foot reconstruction. As with previous successful foot reconstruction work, we seek to extract rich per-pixel geometry cues from multi-view RGB images, and fuse these into a final 3D object. Our method, FOCUS, tackles this problem with 3 main contributions: (i) SynFoot2, an extension of an existing synthetic foot dataset to include a new data type: dense correspondence with the parameterized foot model FIND; (ii) an uncertainty-aware dense correspondence predictor trained on our synthetic dataset; (iii) two methods for reconstructing a 3D surface from dense correspondence predictions: one inspired by Structure-from-Motion, and one optimization-based using the FIND model. We show that our reconstruction achieves state-of-the-art reconstruction quality in a few-view setting, performing comparably to state-of-the-art when many views are available, and runs substantially faster. We release our synthetic dataset to the research community. Code is available at: https://github.com/OllieBoyne/FOCUS

Paper Structure

This paper contains 45 sections, 6 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Method overview. (a) We use Blender blender to render articulated high resolution meshes, with dense correspondences (TOC) to the generative FIND boyne2022find model. (b) We train a model to predict TOCs and surface normals on real images. (c) We combine these predictions together in a multi-view setting via two methods to yield accurate surface reconstructions: (i) FOCUS-SfM, a Structure-from-Motion based approach; and (ii) FOCUS-O, a model fitting, optimization-based approach.
  • Figure 2: TOC definition. Template Object Coordinates (TOCs), shown on the template of the FIND mesh. RGB values correspond to XYZ, normalized to 0-1 within the template space.
  • Figure 3: Models for rendering. One of 8 foot models used for the synthetic dataset. The mesh has (a) geometry and texture, (b) a TOC mapping to the FIND template model, and (c) a skeleton used for articulation.
  • Figure 4: SynFoot2 examples. We show (a) RGB, (b) TOC, (c) surface normals, and (d) segmentation masks. Further examples are included in the supplementary material.
  • Figure 5: TOC in-the-wild predictions. Predictions on real images, showing (a) RGB input, (b) TOC $\mathbf{t}$, (c) TOC uncertainty $\bm{\sigma}_\mathbf{t}$. Further examples are included in the supplementary material.
  • ...and 6 more figures