Table of Contents
Fetching ...

Deep Phase Coded Image Prior

Nimrod Shabtay, Eli Schwartz, Raja Giryes

TL;DR

The paper tackles passive depth estimation and all-in-focus reconstruction from a single phase-coded image without requiring ground-truth depth maps. It introduces Deep Phase Coded Image Prior (DPCIP), a zero-shot framework that jointly optimizes an implicit generator and a differentiable camera model to produce an all-in-focus image and a depth map, which after passing through the forward model reproduces the captured image. By integrating a phase-mask design with an implicit neural representation and a deep image prior, DPCIP achieves competitive or superior results to supervised methods under the same optical system, and demonstrates robustness to moderate optical mismatch. The approach reduces dataset dependencies, enables real-world applicability with existing phase-coded cameras, and can generate high-quality pseudo-ground-truth for supervised downstream tasks.

Abstract

Phase-coded imaging is a computational imaging method designed to tackle tasks such as passive depth estimation and extended depth of field (EDOF) using depth cues inserted during image capture. Most of the current deep learning-based methods for depth estimation or all-in-focus imaging require a training dataset with high-quality depth maps and an optimal focus point at infinity for all-in-focus images. Such datasets are difficult to create, usually synthetic, and require external graphic programs. We propose a new method named "Deep Phase Coded Image Prior" (DPCIP) for jointly recovering the depth map and all-in-focus image from a coded-phase image using solely the captured image and the optical information of the imaging system. Our approach does not depend on any specific dataset and surpasses prior supervised techniques utilizing the same imaging system. This improvement is achieved through the utilization of a problem formulation based on implicit neural representation (INR) and deep image prior (DIP). Due to our zero-shot method, we overcome the barrier of acquiring accurate ground-truth data of depth maps and all-in-focus images for each new phase-coded system introduced. This allows focusing mainly on developing the imaging system, and not on ground-truth data collection.

Deep Phase Coded Image Prior

TL;DR

The paper tackles passive depth estimation and all-in-focus reconstruction from a single phase-coded image without requiring ground-truth depth maps. It introduces Deep Phase Coded Image Prior (DPCIP), a zero-shot framework that jointly optimizes an implicit generator and a differentiable camera model to produce an all-in-focus image and a depth map, which after passing through the forward model reproduces the captured image. By integrating a phase-mask design with an implicit neural representation and a deep image prior, DPCIP achieves competitive or superior results to supervised methods under the same optical system, and demonstrates robustness to moderate optical mismatch. The approach reduces dataset dependencies, enables real-world applicability with existing phase-coded cameras, and can generate high-quality pseudo-ground-truth for supervised downstream tasks.

Abstract

Phase-coded imaging is a computational imaging method designed to tackle tasks such as passive depth estimation and extended depth of field (EDOF) using depth cues inserted during image capture. Most of the current deep learning-based methods for depth estimation or all-in-focus imaging require a training dataset with high-quality depth maps and an optimal focus point at infinity for all-in-focus images. Such datasets are difficult to create, usually synthetic, and require external graphic programs. We propose a new method named "Deep Phase Coded Image Prior" (DPCIP) for jointly recovering the depth map and all-in-focus image from a coded-phase image using solely the captured image and the optical information of the imaging system. Our approach does not depend on any specific dataset and surpasses prior supervised techniques utilizing the same imaging system. This improvement is achieved through the utilization of a problem formulation based on implicit neural representation (INR) and deep image prior (DIP). Due to our zero-shot method, we overcome the barrier of acquiring accurate ground-truth data of depth maps and all-in-focus images for each new phase-coded system introduced. This allows focusing mainly on developing the imaging system, and not on ground-truth data collection.
Paper Structure (20 sections, 2 equations, 7 figures, 6 tables)

This paper contains 20 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: System overview. We follow a neural field flow, where an implicit generator first maps the encoded input image into reconstruction representations (in our case an all-in-focus image and a depth map). In the second part, A Differential Camera Model (DCM) simulates the acquisition process of a phased-coded imaging system, provides the acquired image. The gradients from the reconstruction loss flow back to the generator forcing it to produce accurate intermediate outputs to match the acquired image after passing through the DCM. In the bottom row we illustrate the actual acquisition process compared to our neural field flow.
  • Figure 2: Hardware setup. Our setup is composed of a phase mask embedded inside the lens on the aperture plane. On the left, the optical components are placed seperately. In the middle, the phase mask is located on the aperture plane. On the right, you can see a schematic view of the phase mask pattern.
  • Figure 3: A qualitative comparison of depth map estimation reveals that our method generates more precise depth maps compared to the mono network from gil2019monster, In order to see the performance gap between the methods, we added an absolute difference error map for each method, we can see that DPCIP tends to produce much more accurate depth maps in the background and on the main object compared to the mono network. Note that the depths are clipped to the imaging system's physical range.
  • Figure 4: A qualitative comparison of image reconstruction. Our method produced a much more accurate all-in-focus images. The existing EDOF baseline elmalem2018learned produced over-smooth all-in-focus images compared to our method.
  • Figure 5: Image deblurring. Our method outperforms Neural Deblurring ren2020neural Even though we adapt Neural Deblurring to our deblurring subset, our method still outperforms it by a significant margin, making our approach more usable for real-world applications where the PSF kernels can be large.
  • ...and 2 more figures