Table of Contents
Fetching ...

Revisiting an Old Perspective Projection for Monocular 3D Morphable Models Regression

Toby Chong, Ryota Nakajima

Abstract

We introduce a novel camera model for monocular 3D Morphable Model (3DMM) regression methods that effectively captures the perspective distortion effect commonly seen in close-up facial images. Fitting 3D morphable models to video is a key technique in content creation. In particular, regression-based approaches have produced fast and accurate results by matching the rendered output of the morphable model to the target image. These methods typically achieve stable performance with orthographic projection, which eliminates the ambiguity between focal length and object distance. However, this simplification makes them unsuitable for close-up footage, such as that captured with head-mounted cameras. We extend orthographic projection with a new shrinkage parameter, incorporating a pseudo-perspective effect while preserving the stability of the original projection. We present several techniques that allow finetuning of existing models, and demonstrate the effectiveness of our modification through both quantitative and qualitative comparisons using a custom dataset recorded with head-mounted cameras.

Revisiting an Old Perspective Projection for Monocular 3D Morphable Models Regression

Abstract

We introduce a novel camera model for monocular 3D Morphable Model (3DMM) regression methods that effectively captures the perspective distortion effect commonly seen in close-up facial images. Fitting 3D morphable models to video is a key technique in content creation. In particular, regression-based approaches have produced fast and accurate results by matching the rendered output of the morphable model to the target image. These methods typically achieve stable performance with orthographic projection, which eliminates the ambiguity between focal length and object distance. However, this simplification makes them unsuitable for close-up footage, such as that captured with head-mounted cameras. We extend orthographic projection with a new shrinkage parameter, incorporating a pseudo-perspective effect while preserving the stability of the original projection. We present several techniques that allow finetuning of existing models, and demonstrate the effectiveness of our modification through both quantitative and qualitative comparisons using a custom dataset recorded with head-mounted cameras.
Paper Structure (26 sections, 9 equations, 5 figures, 3 tables)

This paper contains 26 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: We revisit perspective projection for 3DMM regression. We introduce a post hoc learnable parameter that is compatible with existing methods using orthogonal projection, to improve the reconstruction quality on close-up images.
  • Figure 2: Visualization of the newly introduced shrinkage parameter $\rho$. We estimate the 3DMM and camera parameters using SMIRK SMIRK:CVPR:2024, which employs orthogonal projection ($\rho = 0$). We vary $\rho$ from 0.0 to 5.0 while holding all other parameters constant. Unlike perspective projection, which relies on the combination of $f$ and $t_z$ to control the shrinkage effect, $\rho$ isolates this effect and can therefore be incorporated into existing 3DMM regression methods via fine-tuning.
  • Figure 3: We compare our method with the pretrained SMIRK model, on the images used in the SMIRK paper. They provide similar reconstruction quality visibly.
  • Figure 4: We compare our method with the pretrained and retrained versions of the SMIRK model, on the images from the HMC1M dataset.
  • Figure 5: We finetuned the baseline SMIRK model with full perspective projection. The network would fail to properly capture the perspective distortion effect, and the results remain mostly orthogonal (large $f$ value). For comparsion, this converts to $\rho \approx 0.78$ in our camera model, while our method predicts a more visually similar result with $\rho=1.94$.