Table of Contents
Fetching ...

Depth from Coupled Optical Differentiation

Junjie Luo, Yuxuan Liu, Emma Alexander, Qi Guo

TL;DR

Depth from Coupled Optical Differentiation introduces a passive, low-computation monocular 3D sensing method that derives per-pixel depth from the ratio of image derivatives with respect to optical power and aperture. The core theory yields a closed-form depth equation that is invariant to scene texture under a thin-lens model, enabling a four-image capture strategy with only 36 FLOPs per output pixel. The authors demonstrate a prototype using a deformable lens and motorized iris, achieving a working range more than twice that of prior DfD methods while significantly reducing computation. Confidence-based sparsification further improves depth reliability, and aperture coding (with a pillbox profile) enhances accuracy across depths. This work enables efficient, passive 3D sensing suitable for tiny, power-constrained systems, with potential extensions to single-shot operation and densification.

Abstract

We propose depth from coupled optical differentiation, a low-computation passive-lighting 3D sensing mechanism. It is based on our discovery that per-pixel object distance can be rigorously determined by a coupled pair of optical derivatives of a defocused image using a simple, closed-form relationship. Unlike previous depth-from-defocus (DfD) methods that leverage spatial derivatives of the image to estimate scene depths, the proposed mechanism's use of only optical derivatives makes it significantly more robust to noise. Furthermore, unlike many previous DfD algorithms with requirements on aperture code, this relationship is proved to be universal to a broad range of aperture codes. We build the first 3D sensor based on depth from coupled optical differentiation. Its optical assembly includes a deformable lens and a motorized iris, which enables dynamic adjustments to the optical power and aperture radius. The sensor captures two pairs of images: one pair with a differential change of optical power and the other with a differential change of aperture scale. From the four images, a depth and confidence map can be generated with only 36 floating point operations per output pixel (FLOPOP), more than ten times lower than the previous lowest passive-lighting depth sensing solution to our knowledge. Additionally, the depth map generated by the proposed sensor demonstrates more than twice the working range of previous DfD methods while using significantly lower computation.

Depth from Coupled Optical Differentiation

TL;DR

Depth from Coupled Optical Differentiation introduces a passive, low-computation monocular 3D sensing method that derives per-pixel depth from the ratio of image derivatives with respect to optical power and aperture. The core theory yields a closed-form depth equation that is invariant to scene texture under a thin-lens model, enabling a four-image capture strategy with only 36 FLOPs per output pixel. The authors demonstrate a prototype using a deformable lens and motorized iris, achieving a working range more than twice that of prior DfD methods while significantly reducing computation. Confidence-based sparsification further improves depth reliability, and aperture coding (with a pillbox profile) enhances accuracy across depths. This work enables efficient, passive 3D sensing suitable for tiny, power-constrained systems, with potential extensions to single-shot operation and densification.

Abstract

We propose depth from coupled optical differentiation, a low-computation passive-lighting 3D sensing mechanism. It is based on our discovery that per-pixel object distance can be rigorously determined by a coupled pair of optical derivatives of a defocused image using a simple, closed-form relationship. Unlike previous depth-from-defocus (DfD) methods that leverage spatial derivatives of the image to estimate scene depths, the proposed mechanism's use of only optical derivatives makes it significantly more robust to noise. Furthermore, unlike many previous DfD algorithms with requirements on aperture code, this relationship is proved to be universal to a broad range of aperture codes. We build the first 3D sensor based on depth from coupled optical differentiation. Its optical assembly includes a deformable lens and a motorized iris, which enables dynamic adjustments to the optical power and aperture radius. The sensor captures two pairs of images: one pair with a differential change of optical power and the other with a differential change of aperture scale. From the four images, a depth and confidence map can be generated with only 36 floating point operations per output pixel (FLOPOP), more than ten times lower than the previous lowest passive-lighting depth sensing solution to our knowledge. Additionally, the depth map generated by the proposed sensor demonstrates more than twice the working range of previous DfD methods while using significantly lower computation.
Paper Structure (23 sections, 25 equations, 13 figures, 1 table)

This paper contains 23 sections, 25 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: (a) Technological advantages of the proposed method. We plot the computational complexity, measured in floating point operations per output pixel (FLOPOP), and the working range of a series of efficient monocular, passive-lighting depth sensors. The proposed solution achieves a significantly lower computational complexity and longer working range compared to the previous best. (b) System diagram. The proposed depth sensor captures four images of a fixed scene with different optical settings and produces a sparse depth map with only 36 FLOPOP.
  • Figure 2: (a) Principle of coupled optical differentiation. Consider a thin lens camera with sensor distance $Z_s$ and adjustable optical power $\rho$ and aperture radius $\mathnormal{A}$. The image it captures is a function of these two optical parameters, $\rho$ and $\mathnormal{A}$, denoted as $I(\rho, \mathnormal{A})$. In this work, we show that the ratio of the optical derivatives, $I_\mathnormal{A} / I_\rho$, reveals the object depth $Z$ at each pixel through closed-form solutions. (b) Images of the same object captured with different optical power $\rho$ and aperture radius $\mathnormal{A}$. By adjusting the optical parameters $\rho, \mathnormal{A}$, the camera can capture images $I$ of the object with different defocus levels. In practice, we can build a system to capture the four highlighted images $I(\rho+\Delta\rho, \mathnormal{A}), I(\rho-\Delta\rho, \mathnormal{A}), I(\rho, \mathnormal{A}+\Delta\mathnormal{A}), I(\rho, \mathnormal{A}-\Delta\mathnormal{A})$ to estimate the optical derivatives $I_\rho$ and $I_\mathnormal{A}$ via finite difference. (c) Pixel intensity vs. optical power $\rho$. The colored markers indicate the intensities of corresponding image pixels in (b). The intensity varies in textured regions, e.g., pixel $\bullet$, when the object is out of the depth-of-field (DoF). Meanwhile, the intensity is close to constant in textureless regions, such as at pixel $\circ$. (d) Pixel intensity vs. aperture radius $\mathnormal{A}$. The plot visualizes the intensity of pixel $\bullet$ as a function of aperture radius $\mathnormal{A}$ under three different aperture radii, $\mathnormal{A}-\Delta\mathnormal{A}, \mathnormal{A}, \mathnormal{A}+\Delta\mathnormal{A}$. As the images with optical power $\rho+\Delta\rho$ are in focus (see b), the pixel intensity stays approximately constant w.r.t. the aperture radius $\mathnormal{A}$ (pink curve).
  • Figure 3: Working range of the proposed method and Focal Track guo2017focal. (a) Mean absolute error (MAE) as a function of depth, with the black dashed line marking the 10% of the depth value. The highlighted regions indicate the working ranges of both methods. Throughout the paper, we define the working range as where the MAE is smaller than $10\%$ of the true depths. The proposed method's working range is four times that of Focal Track. (b) Signal-to-noise ratio (SNR) of optical derivatives $I_\mathnormal{A}, I_\rho$, and the spatial derivative $\nabla^2 I$. The optical derivatives generally have a significantly larger SNR than the spatial derivative $\nabla^2 I$, which explains the higher accuracy and longer working range of the proposed method, where only the optical derivatives $I_\mathnormal{A}, I_\rho$ have been used. Meanwhile, Focal Track leverages the spatial derivative $\nabla^2 I$ for depth estimation. (c) The enlarged portion of (b). The SNRs of optical derivatives $I_\mathnormal{A}, I_\rho$ drop when the object is in focus, i.e., at around 1 m, as explained in Sec. \ref{['secsec:fail']}. This accounts for the proposed method's sudden MAE increase at 1 m in (a).
  • Figure 4: Effect of confidence. (a) The MAE of predicted depth (blue) and the sparsity (yellow) as a function of the confidence threshold. We filter out depth predictions by comparing their corresponding confidence values with a predefined confidence threshold. As the confidence threshold increases, only pixels with higher confidence values remain, and the sparsity of the depth map increases. The blue curve clearly shows the decrease of the MAE when increasing the confidence threshold, which suggests the effectiveness of the confidence metric. (b) MAE as a function of true depth with different confidence thresholds. By increasing the confidence threshold, the sparsity increases and the MAE generally drops for all depths. We label the overall sparsity and highlight the working range for each curve. (c) Working range as a function of overall sparsity, a proxy of confidence threshold.
  • Figure 5: Aperture transmittance profile analysis. (a) Four different apertures parameterized using Eq. \ref{['eq:aper_code_eq']}. The colors of the boxes indicate the corresponding curves in (b) and (c). (b) Amplitude spectrum of the finite optical derivative of the PSFs, $k(\rho+\Delta\rho) - k(\rho-\Delta\rho)$, for each aperture transmittance profile at a specific depth. The black curve indicates the 1/f statistics of natural textures. The pillbox aperture (green) achieves the highest overall amplitude, with the smooth disk being the second. This amplitude spectra relationship is typical at other depths within the working range. (c) The MAE of different aperture transmittance profiles. Consistent with the conclusion of (b), the pillbox aperture achieves the lowest MAE at a wide range of depths.
  • ...and 8 more figures