Depth from Coupled Optical Differentiation
Junjie Luo, Yuxuan Liu, Emma Alexander, Qi Guo
TL;DR
Depth from Coupled Optical Differentiation introduces a passive, low-computation monocular 3D sensing method that derives per-pixel depth from the ratio of image derivatives with respect to optical power and aperture. The core theory yields a closed-form depth equation that is invariant to scene texture under a thin-lens model, enabling a four-image capture strategy with only 36 FLOPs per output pixel. The authors demonstrate a prototype using a deformable lens and motorized iris, achieving a working range more than twice that of prior DfD methods while significantly reducing computation. Confidence-based sparsification further improves depth reliability, and aperture coding (with a pillbox profile) enhances accuracy across depths. This work enables efficient, passive 3D sensing suitable for tiny, power-constrained systems, with potential extensions to single-shot operation and densification.
Abstract
We propose depth from coupled optical differentiation, a low-computation passive-lighting 3D sensing mechanism. It is based on our discovery that per-pixel object distance can be rigorously determined by a coupled pair of optical derivatives of a defocused image using a simple, closed-form relationship. Unlike previous depth-from-defocus (DfD) methods that leverage spatial derivatives of the image to estimate scene depths, the proposed mechanism's use of only optical derivatives makes it significantly more robust to noise. Furthermore, unlike many previous DfD algorithms with requirements on aperture code, this relationship is proved to be universal to a broad range of aperture codes. We build the first 3D sensor based on depth from coupled optical differentiation. Its optical assembly includes a deformable lens and a motorized iris, which enables dynamic adjustments to the optical power and aperture radius. The sensor captures two pairs of images: one pair with a differential change of optical power and the other with a differential change of aperture scale. From the four images, a depth and confidence map can be generated with only 36 floating point operations per output pixel (FLOPOP), more than ten times lower than the previous lowest passive-lighting depth sensing solution to our knowledge. Additionally, the depth map generated by the proposed sensor demonstrates more than twice the working range of previous DfD methods while using significantly lower computation.
