FIND: An Unsupervised Implicit 3D Model of Articulated Human Feet
Oliver Boyne, James Charles, Roberto Cipolla
TL;DR
This work tackles high-fidelity 3D foot reconstruction under limited supervision by introducing FIND, an implicit neural deformation field that deforms a template foot mesh using disentangled latent codes for shape, texture, and pose (each of length $100$). It combines differentiable rendering and an unsupervised part-based loss derived from StyleGAN2 Restyle features to learn 2D–3D part semantics and imposes a 3D surface constraint on per-vertex part probabilities, enabling flexible, multi-resolution mesh outputs. Foot3D, a high-resolution textured foot dataset, supports training and evaluation, with results showing that FIND outperforms a PCA-based baseline in shape quality and part correspondences, and that the unsupervised part-based loss improves image-based fitting, especially with few views. The approach has practical implications for home health, orthotics design, and online footwear by enabling accurate, interpretable foot models on resource-constrained devices.
Abstract
In this paper we present a high fidelity and articulated 3D human foot model. The model is parameterised by a disentangled latent code in terms of shape, texture and articulated pose. While high fidelity models are typically created with strong supervision such as 3D keypoint correspondences or pre-registration, we focus on the difficult case of little to no annotation. To this end, we make the following contributions: (i) we develop a Foot Implicit Neural Deformation field model, named FIND, capable of tailoring explicit meshes at any resolution i.e. for low or high powered devices; (ii) an approach for training our model in various modes of weak supervision with progressively better disentanglement as more labels, such as pose categories, are provided; (iii) a novel unsupervised part-based loss for fitting our model to 2D images which is better than traditional photometric or silhouette losses; (iv) finally, we release a new dataset of high resolution 3D human foot scans, Foot3D. On this dataset, we show our model outperforms a strong PCA implementation trained on the same data in terms of shape quality and part correspondences, and that our novel unsupervised part-based loss improves inference on images.
