Table of Contents
Fetching ...

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Fan Zhang, Ziyue Ji, Weiguang Kang, Weiqing Li, Zhiyong Su

TL;DR

This work tackles single-view 3D reconstruction of thin eyeglasses frames by introducing a mesh-deformation framework that leverages a class-specific template, 42 predefined keypoints, and differentiable rendering. It combines a coarse-to-fine keypoint detector, camera pose estimation, and an unsupervised free-form deformation to progressively warp the template while enforcing image-, silhouette-, and symmetry-based constraints. The approach demonstrates accurate reconstruction on a synthetic dataset and competitive results on real images, outperforming general multi-view and single-view baselines by exploiting domain priors such as frame symmetry and fixed topology. The method is particularly suited for VR/AR try-on applications and offers a path toward more robust, texture-insensitive 3D reconstruction of slender objects. Limitations include topology rigidity and front-view requirements, suggesting future work on wild-image capture and self-supervised enhancements.

Abstract

With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

TL;DR

This work tackles single-view 3D reconstruction of thin eyeglasses frames by introducing a mesh-deformation framework that leverages a class-specific template, 42 predefined keypoints, and differentiable rendering. It combines a coarse-to-fine keypoint detector, camera pose estimation, and an unsupervised free-form deformation to progressively warp the template while enforcing image-, silhouette-, and symmetry-based constraints. The approach demonstrates accurate reconstruction on a synthetic dataset and competitive results on real images, outperforming general multi-view and single-view baselines by exploiting domain priors such as frame symmetry and fixed topology. The method is particularly suited for VR/AR try-on applications and offers a path toward more robust, texture-insensitive 3D reconstruction of slender objects. Limitations include topology rigidity and front-view requirements, suggesting future work on wild-image capture and self-supervised enhancements.

Abstract

With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.
Paper Structure (35 sections, 14 equations, 17 figures, 6 tables)

This paper contains 35 sections, 14 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Reconstruction results of different approaches for an eyeglasses frame in the input image.
  • Figure 2: Overview of the proposed reconstruction framework. During the offline phase, we first build a 3D eyeglasses frame model dataset, and then define a template mesh associated with 42 keypoints. In the online phase, given an input RGB image, after estimating the keypoints and camera pose, the unsupervised free-form deformer progressively deforms the template mesh to enforce consistency between the rendered result and the input RGB image through the differentiable rendering in an iterative manner. The red arrow part represents the progress of our online optimization iteration.
  • Figure 3: Examples of six typical kinds of eyeglasses frames (e.g., rectangle, octagon, and circle) and their 3D models in our dataset.
  • Figure 4: Predefined keypoints on the template.
  • Figure 5: Network architecture of the keypoint detector. The input of a dense block comes from every output of the foregoing dense block. The output of the encoder is squeezed into a vector and is regressed into keypoint coordinates by an MLP.
  • ...and 12 more figures