Table of Contents
Fetching ...

Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus

Jinchang Zhang, Ningning Xu, Hao Zhang, Guoyu Lu

TL;DR

This work tackles monocular depth estimation by leveraging defocus cues without requiring multi-image focus stacks. It introduces a self-supervised framework that unifies a Siamese Defocus Network (SDNet) for defocus mapping with a 3D Gaussian Splatting renderer, guided by a camera-lens model to produce synthetic defocus and supervision via blur reconstruction. DepthNet then refines a depth prediction using the learned defocus maps and an initial depth from the splatting stage, optimizing with defocus, blur, and reconstruction losses. The approach achieves competitive or superior results on FoD500 and NYUv2 with a single defocused input, and demonstrates practical potential for real-world depth estimation where rapid focus adjustments are impractical.

Abstract

Depth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.

Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus

TL;DR

This work tackles monocular depth estimation by leveraging defocus cues without requiring multi-image focus stacks. It introduces a self-supervised framework that unifies a Siamese Defocus Network (SDNet) for defocus mapping with a 3D Gaussian Splatting renderer, guided by a camera-lens model to produce synthetic defocus and supervision via blur reconstruction. DepthNet then refines a depth prediction using the learned defocus maps and an initial depth from the splatting stage, optimizing with defocus, blur, and reconstruction losses. The approach achieves competitive or superior results on FoD500 and NYUv2 with a single defocused input, and demonstrates practical potential for real-world depth estimation where rapid focus adjustments are impractical.

Abstract

Depth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.
Paper Structure (14 sections, 16 equations, 4 figures, 5 tables)

This paper contains 14 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An overview of the SDNet. We adopt the siamese network structure with mpvit modellee2022mpvit and convolutional layer to enhance the defocus map modeling. We use defocus loss moduletao2023siamese to learn the relationship between distance and blurriness while training the Siamese network. After training, we use one single blurred image to predict the defocus map for the depth inference.
  • Figure 2: (a) An illustration of the camera Thin-Lens model. Objects on the focal plane (indicated by the orange line) are sharply imaged, while objects off the focal plane appear blurred due to the Circle of Confusion (CoC). (b) The CoC curve derived from the NYUv2 dataset demonstrates the relationship between object depth and the blur radius, where the blur radius initially decreases as depth increases and then enlarges.
  • Figure 3: Depth estimation results on MobileDFF dataset. The warmer color indicates a larger depth. We choose DAIFNetsi2023fully, AiFDepthNetwang2021bridging, DFVyang2022deep,MobileDFFsuwajanakorn2015depth as a comparsion.
  • Figure 4: 3D map generated results of KITTI Odometry dataset. Each 3D map is created by merging ten consecutive point clouds..