Table of Contents
Fetching ...

Joint Deblurring and 3D Reconstruction for Macrophotography

Yifan Zhao, Liangchen Li, Yuqi Zhou, Kai Wang, Yan Liang, Juyong Zhang

TL;DR

The paper tackles the defocus blur challenge in macrophotography by proposing a self-supervised, joint deblurring and 3D reconstruction framework that operates on multi-view blurred inputs. It models per-pixel defocus as depth-dependent Gaussian blur and optimizes both a sharp 3D scene (via 3D Gaussian Splats) and a BlurNet that predicts spatial blur variances, guided by a differentiable renderer. Key contributions include the first end-to-end joint approach tailored to macro imaging, a depth-aware blur model with a clarity mask, and a multi-stage training strategy that yields high-fidelity 3D appearance and sharp images from few views. The method demonstrates superior deblurring and 3D reconstruction quality over baselines on synthetic and real datasets, highlighting its potential for accurate macro-scale 3D capture without extensive supervised data.

Abstract

Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.

Joint Deblurring and 3D Reconstruction for Macrophotography

TL;DR

The paper tackles the defocus blur challenge in macrophotography by proposing a self-supervised, joint deblurring and 3D reconstruction framework that operates on multi-view blurred inputs. It models per-pixel defocus as depth-dependent Gaussian blur and optimizes both a sharp 3D scene (via 3D Gaussian Splats) and a BlurNet that predicts spatial blur variances, guided by a differentiable renderer. Key contributions include the first end-to-end joint approach tailored to macro imaging, a depth-aware blur model with a clarity mask, and a multi-stage training strategy that yields high-fidelity 3D appearance and sharp images from few views. The method demonstrates superior deblurring and 3D reconstruction quality over baselines on synthetic and real datasets, highlighting its potential for accurate macro-scale 3D capture without extensive supervised data.

Abstract

Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.

Paper Structure

This paper contains 23 sections, 14 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison of imaging scenes and their characteristics. In microscopic imaging, imaged objects are often slides with no 3D reconstruction possibilities. In macrophotography imaging, the imaged objects are with 3D structure, but small and have a shallow depth of field, resulting in defocus blur. In natural imaging, objects are distributed over a wide range of depths. During real macrophotography image capture, one of our measurements is $f+d_0\approx480\text{mm}$ and $\epsilon_{\mathbf{p}}\approx3\text{mm}$. This critical difference makes macro scenes 3D reconstructable but extremely sensitive to depth variations compared to other scenes, which the proposed method aims to address.
  • Figure 2: An overview of our joint deblurring and 3D reconstruction method. First, we initialize a differentiable coarse 3D scene from multi-view input blurred images. Afterwards, BlurNet leverages information from the input images to generate a variance map (i.e. defocus map) for each view. The variance map is then convolved with the rendered image pixel-wise. The output is a correctly blurred image, supervised by the blurred input images during training. Note that this process ultimately produces a sharp scene.
  • Figure 3: The architecture of BlurNet. At the beginning of training, the clarity mask is generated from the input multi-view images via guided filters. The depth map is weighted centered using the clarity mask to generate a relative depth map for the depth CNN $\mathcal{E}$. The rendered image is concatenated with the clarity mask as input to the RGB CNN $\mathcal{A}$. The outputs of both CNNs are multiplied to generate a variance map.
  • Figure 4: Visualization of computed clarity masks. Comparison between defocused input and their corresponding clarity masks demonstrates how our method identifies high-frequency, in-focus regions (bright areas in mask) to estimate the focal plane depth.
  • Figure 5: Visual comparisons on novel views of synthetic images dataset. DbGS, Res, and INIK are the shorthands for Deblurring-3DGS, Restormer+3DGS, and INIKNet+3DGS. We color code the best and the second best.
  • ...and 3 more figures