Table of Contents
Fetching ...

DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model

Erez Yosef, Raja Giryes

TL;DR

This paper tackles high-quality reconstruction for lensless flat cameras, where optics are replaced by a mask and the image is recovered from multiplexed sensor measurements. It introduces DifuzCam, a diffusion-prior-based reconstruction that uses a learned separable transform and a ControlNet to guide a pre-trained latent diffusion model, optionally leveraging textual scene descriptions. The method achieves state-of-the-art reconstruction quality on flat-camera benchmarks and demonstrates gains in perceptual and CLIP-based metrics, trained with losses $l_{ldm}$ and $l_{sep}$. The work highlights the broader potential of diffusion priors for non-traditional imaging and suggests applicability to other imaging systems beyond flat cameras.

Abstract

The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.

DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model

TL;DR

This paper tackles high-quality reconstruction for lensless flat cameras, where optics are replaced by a mask and the image is recovered from multiplexed sensor measurements. It introduces DifuzCam, a diffusion-prior-based reconstruction that uses a learned separable transform and a ControlNet to guide a pre-trained latent diffusion model, optionally leveraging textual scene descriptions. The method achieves state-of-the-art reconstruction quality on flat-camera benchmarks and demonstrates gains in perceptual and CLIP-based metrics, trained with losses and . The work highlights the broader potential of diffusion priors for non-traditional imaging and suggests applicability to other imaging systems beyond flat cameras.

Abstract

The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.
Paper Structure (8 sections, 5 equations, 8 figures, 2 tables)

This paper contains 8 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Using (a) our prototype flat camera, (b) a measurement image is captured that is not visually understandable. (c) An image is reconstructed from the measurements using our text-guided approach. Compare to (d) the reference image captured with a regular camera. (see \ref{['fig:prototype_result_real']} for details).
  • Figure 2: DifuzCam Proposed Method. A flat lensless camera measurements are inputted to a separable linear transformation followed by a ControlNet adapter to control the image generation process by a pre-trained latent diffusion model (LDM). Using text guidance for the reconstruction process is optional in our method. On training, the yellow blocks weights are optimized, and orange paths are the training losses, while the blue blocks weights are pre-trained and fixed. The dataset for training and tests was captured using our prototype flat camera by projecting images onto a screen. We present an additional loss $l_{sep}$ in addition to the diffusion training loss ($l_{\mathcal{C}}$) for better convergence. Reconstructed image is achieved by $T$ iterative diffusion steps and decoding from latent space to pixel space.
  • Figure 3: Optical design. (a) The designed separable mask which was printed on a chrome plate using lithography. (b) The measured PSF by the prototype camera with the coded mask
  • Figure 4: Qualitative Results. Examples of reconstruction results for our captured dataset with the prototype camera we designed.
  • Figure 5: Results Comparison. We compare the results of our proposed method to the previous method FlatNet-T khan2020flatnet on their published data set with their network weights.
  • ...and 3 more figures