Table of Contents
Fetching ...

Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries

Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo

TL;DR

The paper tackles depth estimation under photon-limited imaging by introducing Blurry-Edges, a patch-based representation that encodes color, boundary position, and boundary smoothness. A two-stage CNN-Transformer network predicts Blurry-Edges from a pair of defocused images, enabling a closed-form depth from defocus equation to compute depth along boundaries. Key contributions include the Blurry-Edges representation, a derivation of a depth formula from boundary smoothness across two defocus levels, and a robust two-stage learning framework that generalizes to real noisy data. Experiments on synthetic and real data show superior accuracy and boundary-focused depth maps, highlighting robustness to low-light conditions and potential for high-noise scenarios.

Abstract

Extracting depth information from photon-limited, defocused images is challenging because depth from defocus (DfD) relies on accurate estimation of defocus blur, which is fundamentally sensitive to image noise. We present a novel approach to robustly measure object depths from photon-limited images along the defocused boundaries. It is based on a new image patch representation, Blurry-Edges, that explicitly stores and visualizes a rich set of low-level patch information, including boundaries, color, and smoothness. We develop a deep neural network architecture that predicts the Blurry-Edges representation from a pair of differently defocused images, from which depth can be calculated using a closed-form DfD relation we derive. The experimental results on synthetic and real data show that our method achieves the highest depth estimation accuracy on photon-limited images compared to a broad range of state-of-the-art DfD methods.

Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries

TL;DR

The paper tackles depth estimation under photon-limited imaging by introducing Blurry-Edges, a patch-based representation that encodes color, boundary position, and boundary smoothness. A two-stage CNN-Transformer network predicts Blurry-Edges from a pair of defocused images, enabling a closed-form depth from defocus equation to compute depth along boundaries. Key contributions include the Blurry-Edges representation, a derivation of a depth formula from boundary smoothness across two defocus levels, and a robust two-stage learning framework that generalizes to real noisy data. Experiments on synthetic and real data show superior accuracy and boundary-focused depth maps, highlighting robustness to low-light conditions and potential for high-noise scenarios.

Abstract

Extracting depth information from photon-limited, defocused images is challenging because depth from defocus (DfD) relies on accurate estimation of defocus blur, which is fundamentally sensitive to image noise. We present a novel approach to robustly measure object depths from photon-limited images along the defocused boundaries. It is based on a new image patch representation, Blurry-Edges, that explicitly stores and visualizes a rich set of low-level patch information, including boundaries, color, and smoothness. We develop a deep neural network architecture that predicts the Blurry-Edges representation from a pair of differently defocused images, from which depth can be calculated using a closed-form DfD relation we derive. The experimental results on synthetic and real data show that our method achieves the highest depth estimation accuracy on photon-limited images compared to a broad range of state-of-the-art DfD methods.

Paper Structure

This paper contains 33 sections, 52 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: Overview. (Left) Blurry-Edges representation parametrically models an image patch's color, boundary positions, and boundary smoothness. Object depths can be analytically calculated from the smoothness of corresponding boundaries in a pair of differently defocused images. (Right) Compared to a variety of state-of-the-art depth from defocus algorithms guo2017focaltang2017depthwu2019phasecam3dmaximov2020focusyang2022deepsi2023fully, our method generates sparse or dense depth maps with the lowest depth estimation errors from photon-limited, noisy images.
  • Figure 2: Blurry-Edges representation with the number of wedges $l=2$. (a) The $i$th wedge is parameterized by the vertex position $(x_i, y_i)$, the starting and ending angle $(\theta_{i1}, \theta_{i2})$, the color $\boldsymbol{c}_i$, and the boundary smoothness $\eta_i$. The rendering of the patch is through the alpha compositing of the wedges. (b) Blurry-Edges can represent a variety of boundary structures. In particular, it can represent structures with various boundary smoothness.
  • Figure 3: Visualizations from a sample Blurry-Edges representation. (a) The unsigned distance map to the nearest unoccluded boundary, $u \left( \boldsymbol{x}; \boldsymbol{\Psi} \right)$. (b) The corresponding boundary center map, $b \left( \boldsymbol{x}; \boldsymbol{\Psi}, \delta \right)$. (c) The signed distance map of the bottom wedge, $d_1 \left( \boldsymbol{x}; \boldsymbol{\Psi} \right)$. (d) The $\alpha$-map of the bottom wedge, $\alpha_1 \left( \boldsymbol{x}; \boldsymbol{\Psi} \right)$. (e) The color map of the patch, $c \left( \boldsymbol{x}; \boldsymbol{\Psi} \right)$. (f) The magnitude of color derivative map of the patch, $c^{\prime} \left( \boldsymbol{x}; \boldsymbol{\Psi} \right)$.
  • Figure 4: Framework of the proposed model. There are two stages. The local stage consists of residual blocks and predicts the Blurry-Edges representation for each patch locally. The global stage consists of a Transformer Encoder and refines the Blurry-Edges representation for all patches globally. Finally, the framework combines all the per-patch representations and outputs the global boundary map, color map, and depth map.
  • Figure 5: Examples of inputs and global outputs. (a) Noisy input image pair $I_\pm$ with different optical power $\rho_\pm$. (b) Global boundary center map $B \left( \boldsymbol{x} \right)$. (c) Global color map $C \left( \boldsymbol{x} \right)$. (d) Global sparse depth map $Z \left(\boldsymbol{x} \right)$. (e) Sharpened and refocused color maps. (f) Global confidence map $F \left( \boldsymbol{x} \right)$.
  • ...and 12 more figures