DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
Erez Yosef, Raja Giryes
TL;DR
This paper tackles high-quality reconstruction for lensless flat cameras, where optics are replaced by a mask and the image is recovered from multiplexed sensor measurements. It introduces DifuzCam, a diffusion-prior-based reconstruction that uses a learned separable transform and a ControlNet to guide a pre-trained latent diffusion model, optionally leveraging textual scene descriptions. The method achieves state-of-the-art reconstruction quality on flat-camera benchmarks and demonstrates gains in perceptual and CLIP-based metrics, trained with losses $l_{ldm}$ and $l_{sep}$. The work highlights the broader potential of diffusion priors for non-traditional imaging and suggests applicability to other imaging systems beyond flat cameras.
Abstract
The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.
