Table of Contents
Fetching ...

GANESH: Generalizable NeRF for Lensless Imaging

Rakesh Raj Madavan, Akshat Kaimal, Badhrinarayanan K, Vinayak Gupta, Rohit Choudhary, Chandrakala Shanmuganathan, Kaushik Mitra

TL;DR

GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images, is introduced, which outperforms current approaches in reconstruction accuracy and refinement quality.

Abstract

Lensless imaging offers a significant opportunity to develop ultra-compact cameras by removing the conventional bulky lens system. However, without a focusing element, the sensor's output is no longer a direct image but a complex multiplexed scene representation. Traditional methods have attempted to address this challenge by employing learnable inversions and refinement models, but these methods are primarily designed for 2D reconstruction and do not generalize well to 3D reconstruction. We introduce GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images. Unlike existing methods that require scene-specific training, our approach supports on-the-fly inference without retraining on each scene. Moreover, our framework allows us to tune our model to specific scenes, enhancing the rendering and refinement quality. To facilitate research in this area, we also present the first multi-view lensless dataset, LenslessScenes. Extensive experiments demonstrate that our method outperforms current approaches in reconstruction accuracy and refinement quality. Code and video results are available at https://rakesh-123-cryp.github.io/Rakesh.github.io/

GANESH: Generalizable NeRF for Lensless Imaging

TL;DR

GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images, is introduced, which outperforms current approaches in reconstruction accuracy and refinement quality.

Abstract

Lensless imaging offers a significant opportunity to develop ultra-compact cameras by removing the conventional bulky lens system. However, without a focusing element, the sensor's output is no longer a direct image but a complex multiplexed scene representation. Traditional methods have attempted to address this challenge by employing learnable inversions and refinement models, but these methods are primarily designed for 2D reconstruction and do not generalize well to 3D reconstruction. We introduce GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images. Unlike existing methods that require scene-specific training, our approach supports on-the-fly inference without retraining on each scene. Moreover, our framework allows us to tune our model to specific scenes, enhancing the rendering and refinement quality. To facilitate research in this area, we also present the first multi-view lensless dataset, LenslessScenes. Extensive experiments demonstrate that our method outperforms current approaches in reconstruction accuracy and refinement quality. Code and video results are available at https://rakesh-123-cryp.github.io/Rakesh.github.io/

Paper Structure

This paper contains 24 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Reconstructing 3D scenes from multi-view lensless captures presents significant challenges. To tackle this, we propose GANESH, a novel framework that refines lensless captures while simultaneously rendering novel views. Existing 2D approaches address this task in a sequential, two-step process, resulting in suboptimal 3D reconstruction quality. In contrast, GANESH integrates these two stages into a unified framework, enabling joint optimization for superior novel view synthesis.
  • Figure 2: Overview of GANESH: 1) Given multi-view lensless images of a scene, we first Wiener deconvolve the lensless captures to obtain coarse images. 2) These are then passed onto a deep convolutional network to extract features for every input view. 3) Using the source view features, we estimate the target refined rendered view via an epipolar-based rendering pipeline. 4) By supervising this pipeline end-to-end on paired synthetic data, our model learns to inherently refine the coarse estimated images and simultaneously render novel views eliminating the need for a separate refiner. Our method can directly generalize to any new scene during inference.
  • Figure 3: a) Lensless camera setup used to capture the real-world dataset LenslessScenes. b) The calibrated Point Spread Function (PSF) is used for simulating lensless captures in our synthetically generated dataset.
  • Figure 4: Qualitative results for scene-specific experiment on the synthetic NeRF-LLFF dataset. FlatNet+NeRF baseline exhibits significant artifacts and fails to preserve critical scene geometry. While FlatNet+GNT improves scene geometry reconstruction, it introduces excessive smoothing, resulting in the loss of high-frequency details. In contrast, our proposed method accurately reconstructs scene geometry and renders novel views, preserving high-frequency details and delivering superior visual fidelity. Note that the input to all the baselines and our model is the direct lensless capture. In the first column of this figure and all the subsequent figures, we show the Wiener Deconvolution (WD) output just for visualisation.
  • Figure 5: Qualitative results for generalizable setting conducted on the synthetic NeRF-LLFF dataset. We observe that the FlatNet+IBRNet and FlatNet+GNT baselines fall short in rendering high-fidelity novel views compared to our method. Our approach demonstrates superior recovery of fine geometry and textures.
  • ...and 7 more figures