Table of Contents
Fetching ...

BLiSS: Bootstrapped Linear Shape Space

Sanjeev Muralikrishnan, Chun-Hao Paul Huang, Duygu Ceylan, Niloy J. Mitra

TL;DR

BLiSS introduces a bootstrapped approach to building expressive human shape spaces by jointly learning a linear PCA-based space and a nonlinear Neural Jacobian Field (NJF) deformation mechanism. It starts from a small set of manually registered scans and iteratively enlarges the shape space by automatically registering unregistered scans, enriching details beyond the linear basis. Empirical results on the CAESAR dataset show that BLiSS can match or exceed state-of-the-art morphable models (SMPL, STAR, GHUM) using only around 5% of the manual annotations, achieving a vertex-to-vertex error of about $0.90$ cm after about 800 unregistered scans, and an upper-bound performance near $0.87$ cm when fully annotated. The method also demonstrates applicability to single-image shape estimation (via SMPLify-X integration) and generalization to face data, though it does not currently handle pose corrective spaces or complex hand poses, pointing to future extensions with iterative nonlinear refinements.

Abstract

Morphable models are fundamental to numerous human-centered processes as they offer a simple yet expressive shape space. Creating such morphable models, however, is both tedious and expensive. The main challenge is establishing dense correspondences across raw scans that capture sufficient shape variation. This is often addressed using a mix of significant manual intervention and non-rigid registration. We observe that creating a shape space and solving for dense correspondence are tightly coupled -- while dense correspondence is needed to build shape spaces, an expressive shape space provides a reduced dimensional space to regularize the search. We introduce BLiSS, a method to solve both progressively. Starting from a small set of manually registered scans to bootstrap the process, we enrich the shape space and then use that to get new unregistered scans into correspondence automatically. The critical component of BLiSS is a non-linear deformation model that captures details missed by the low-dimensional shape space, thus allowing progressive enrichment of the space.

BLiSS: Bootstrapped Linear Shape Space

TL;DR

BLiSS introduces a bootstrapped approach to building expressive human shape spaces by jointly learning a linear PCA-based space and a nonlinear Neural Jacobian Field (NJF) deformation mechanism. It starts from a small set of manually registered scans and iteratively enlarges the shape space by automatically registering unregistered scans, enriching details beyond the linear basis. Empirical results on the CAESAR dataset show that BLiSS can match or exceed state-of-the-art morphable models (SMPL, STAR, GHUM) using only around 5% of the manual annotations, achieving a vertex-to-vertex error of about cm after about 800 unregistered scans, and an upper-bound performance near cm when fully annotated. The method also demonstrates applicability to single-image shape estimation (via SMPLify-X integration) and generalization to face data, though it does not currently handle pose corrective spaces or complex hand poses, pointing to future extensions with iterative nonlinear refinements.

Abstract

Morphable models are fundamental to numerous human-centered processes as they offer a simple yet expressive shape space. Creating such morphable models, however, is both tedious and expensive. The main challenge is establishing dense correspondences across raw scans that capture sufficient shape variation. This is often addressed using a mix of significant manual intervention and non-rigid registration. We observe that creating a shape space and solving for dense correspondence are tightly coupled -- while dense correspondence is needed to build shape spaces, an expressive shape space provides a reduced dimensional space to regularize the search. We introduce BLiSS, a method to solve both progressively. Starting from a small set of manually registered scans to bootstrap the process, we enrich the shape space and then use that to get new unregistered scans into correspondence automatically. The critical component of BLiSS is a non-linear deformation model that captures details missed by the low-dimensional shape space, thus allowing progressive enrichment of the space.
Paper Structure (25 sections, 4 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Given a sparse set of scans $\mathcal{S}_R\xspace$, and their registrations $\mathcal{R}\xspace$ to a common template, we learn a linear shape space $\mathcal{B}_\mathit{PCA}\xspace$ using $\mathcal{R}_\mathit{PCA}\xspace$ and train a non-linear NJF-based deformation model using $\mathcal{R}_\mathit{DEFORM}\xspace$. Then, given a scan $S\xspace_U$ from a set of unregistered scans $\mathcal{U}\xspace$, we project it to the PCA basis to obtain $X_o\xspace$ and utilize NJF-based deformation to recover its registration to the template $X'\xspace$ in the canonical pose. To enhance our shape space, we calculate the Chamfer Distance ($D_{CD}$) of registrations to target scans. We add all registrations where the distance falls within one standard deviation of the minimum distance to $\mathcal{R}_\mathit{PCA}\xspace$. We repeat this process to jointly register raw scans and enrich our shape space.
  • Figure 2: We show the histogram of the v2v error of the scans in our test set at different iterations of our method. We also color code the per-vertex error for an example scan. As our method progresses, the error decreases, and we observe a slight left shift in the histogram as the shape space improves. Insets show residue error on one scan over iterations.
  • Figure 3: For a given raw scan, we register each body model by predicting pose and body shape parameters. (Top) Each result is color coded based on the v2p error in meters w.r.t. the ground truth registration provided by the artist.
  • Figure 4: We show shapes along the top three principal directions in different rows, and observe variations in gender, height, and weight along the respective PCA modes.
  • Figure 5: Left: Registration (pink) of noisy scans (blue) with our final shape space. Since our model does not capture finger-level details, after optimization, the joints corresponding to the greyed-out regions are reset to default poses. Right: We show sampled faces from our final face-shape space after growing it from $20 \rightarrow 800$ shapes. We observe a variety of face changes in the cheek and nose regions. (Bottom) We take the test scans from the COMA dataset (in blue) and register them in our final face-shape space, which is shown in pink.
  • ...and 4 more figures