Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering
Fan Zhang, Ziyue Ji, Weiguang Kang, Weiqing Li, Zhiyong Su
TL;DR
This work tackles single-view 3D reconstruction of thin eyeglasses frames by introducing a mesh-deformation framework that leverages a class-specific template, 42 predefined keypoints, and differentiable rendering. It combines a coarse-to-fine keypoint detector, camera pose estimation, and an unsupervised free-form deformation to progressively warp the template while enforcing image-, silhouette-, and symmetry-based constraints. The approach demonstrates accurate reconstruction on a synthetic dataset and competitive results on real images, outperforming general multi-view and single-view baselines by exploiting domain priors such as frame symmetry and fixed topology. The method is particularly suited for VR/AR try-on applications and offers a path toward more robust, texture-insensitive 3D reconstruction of slender objects. Limitations include topology rigidity and front-view requirements, suggesting future work on wild-image capture and self-supervised enhancements.
Abstract
With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.
