Table of Contents
Fetching ...

SARS: A Novel Face and Body Shape and Appearance Aware 3D Reconstruction System extends Morphable Models

Gulraiz Khan, Kenneth Y. Wertheim, Kevin Pimbblet, Waqas Ahmed

TL;DR

This work tackles monocular 3D human reconstruction with identity- and expression-aware facial details, proposing SARS—a modular system comprising a face module that leverages semantic priors and a 3DMM-based face representation, a body module built on SPIN/SMPL, and a fusion integration module to produce a coherent full-body mesh. The face module fuses a latent encoding of displacement maps and signed distance fields with high-level semantics (age, gender, landmarks) via a StyleGAN2-based decoder, while the body module refines pose and shape through SMPLify and SMPL. The approach enables end-to-end, modular inference of face and body, offering boundary stitching, attention-based refinement, and a multi-task discriminator to enforce semantic consistency, achieving state-of-the-art or competitive results on MICC Florence 3D, 3DPW, and EHF datasets. The work advances practical 3D avatar creation for AR/VR and fashion by delivering high-fidelity, identity-preserving full-body reconstructions from single images, and it points to future enhancements in semantic control, volumetric body modeling, and real-time performance.

Abstract

Morphable Models (3DMMs) are a type of morphable model that takes 2D images as inputs and recreates the structure and physical appearance of 3D objects, especially human faces and bodies. 3DMM combines identity and expression blendshapes with a basic face mesh to create a detailed 3D model. The variability in the 3D Morphable models can be controlled by tuning diverse parameters. They are high-level image descriptors, such as shape, texture, illumination, and camera parameters. Previous research in 3D human reconstruction concentrated solely on global face structure or geometry, ignoring face semantic features such as age, gender, and facial landmarks characterizing facial boundaries, curves, dips, and wrinkles. In order to accommodate changes in these high-level facial characteristics, this work introduces a shape and appearance-aware 3D reconstruction system (named SARS by us), a c modular pipeline that extracts body and face information from a single image to properly rebuild the 3D model of the human full body.

SARS: A Novel Face and Body Shape and Appearance Aware 3D Reconstruction System extends Morphable Models

TL;DR

This work tackles monocular 3D human reconstruction with identity- and expression-aware facial details, proposing SARS—a modular system comprising a face module that leverages semantic priors and a 3DMM-based face representation, a body module built on SPIN/SMPL, and a fusion integration module to produce a coherent full-body mesh. The face module fuses a latent encoding of displacement maps and signed distance fields with high-level semantics (age, gender, landmarks) via a StyleGAN2-based decoder, while the body module refines pose and shape through SMPLify and SMPL. The approach enables end-to-end, modular inference of face and body, offering boundary stitching, attention-based refinement, and a multi-task discriminator to enforce semantic consistency, achieving state-of-the-art or competitive results on MICC Florence 3D, 3DPW, and EHF datasets. The work advances practical 3D avatar creation for AR/VR and fashion by delivering high-fidelity, identity-preserving full-body reconstructions from single images, and it points to future enhancements in semantic control, volumetric body modeling, and real-time performance.

Abstract

Morphable Models (3DMMs) are a type of morphable model that takes 2D images as inputs and recreates the structure and physical appearance of 3D objects, especially human faces and bodies. 3DMM combines identity and expression blendshapes with a basic face mesh to create a detailed 3D model. The variability in the 3D Morphable models can be controlled by tuning diverse parameters. They are high-level image descriptors, such as shape, texture, illumination, and camera parameters. Previous research in 3D human reconstruction concentrated solely on global face structure or geometry, ignoring face semantic features such as age, gender, and facial landmarks characterizing facial boundaries, curves, dips, and wrinkles. In order to accommodate changes in these high-level facial characteristics, this work introduces a shape and appearance-aware 3D reconstruction system (named SARS by us), a c modular pipeline that extracts body and face information from a single image to properly rebuild the 3D model of the human full body.
Paper Structure (36 sections, 20 equations, 7 figures, 5 tables)

This paper contains 36 sections, 20 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Full Framework Diagram with three modules highlighted with dotted lines: 1 High Level Feature Extraction, 2 Face Reconstruction Module, 3 Body Reconstruction Module, 4 Integration Module. Each Module's output is represented with capital letters. A shows the values of gender, age, and landmarks; B represents a 3D mesh of the face, having vertices and edges; C shows a 3D mesh of the full body in the representation of dense vertices and edges; D shows the final output of the merged face and body module. $\nabla$ represents the mesh with dense vertices, edges, and triangles
  • Figure 2: A simplified illustration of the priviously proposed network M1. Showcasing the Backbone network generating features with dimensions $518\times8\times8$ and sharing these shared features to three target branches of age, gender, and landmarks.
  • Figure 3: Module 2: Simplified Architecture Diagram of Face Reconstruction based on high-level Semantic Features (Age, Gender and Landmarks)
  • Figure 4: Sample images from MICC Florence Benchmark
  • Figure 5: Sample images from 3D Pose in the Wild Benchmark
  • ...and 2 more figures