Table of Contents
Fetching ...

Learning Radiance Fields from a Single Snapshot Compressive Image

Yunhao Li, Xiang Liu, Xiaodong Wang, Xin Yuan, Peidong Liu

TL;DR

This work tackles recovering 3D scene structure from a single snapshot produced by snapshot compressive imaging (SCI). It introduces two end-to-end frameworks, SCINeRF and SCISplat, that encode SCI measurements into 3D representations using NeRF and 3D Gaussian Splatting, respectively, while jointly optimizing scene content and camera poses under the SCI formation model. SCISplat further achieves real-time rendering by employing a differentiable Gaussian rasterization and a robust MCMC-based densification strategy, significantly outperforming state-of-the-art SCI decoders on synthetic and real data in both image reconstruction and novel-view synthesis. The results demonstrate the practicality of 3D-aware SCI decoding for high-frame-rate, multi-view-consistent rendering with potential privacy-preserving and storage-efficient advantages in practical deployments.

Abstract

In this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene structure from a single temporal compressed image. SCI is a cost-effective method that enables the recording of high-dimensional data, such as hyperspectral or temporal information, into a single image using low-cost 2D imaging sensors. To achieve this, a series of specially designed 2D masks are usually employed, reducing storage and transmission requirements and offering potential privacy protection. Inspired by this, we take one step further to recover the encoded 3D scene information leveraging powerful 3D scene representation capabilities of neural radiance fields (NeRF). Specifically, we propose SCINeRF, in which we formulate the physical imaging process of SCI as part of the training of NeRF, allowing us to exploit its impressive performance in capturing complex scene structures. In addition, we further integrate the popular 3D Gaussian Splatting (3DGS) framework and propose SCISplat to improve 3D scene reconstruction quality and training/rendering speed by explicitly optimizing point clouds into 3D Gaussian representations. To assess the effectiveness of our method, we conduct extensive evaluations using both synthetic data and real data captured by our SCI system. Experimental results demonstrate that our proposed approach surpasses the state-of-the-art methods in terms of image reconstruction and novel view synthesis. Moreover, our method also exhibits the ability to render high frame-rate multi-view consistent images in real time by leveraging SCI and the rendering capabilities of 3DGS. Codes will be available at: https://github.com/WU- CVGL/SCISplat.

Learning Radiance Fields from a Single Snapshot Compressive Image

TL;DR

This work tackles recovering 3D scene structure from a single snapshot produced by snapshot compressive imaging (SCI). It introduces two end-to-end frameworks, SCINeRF and SCISplat, that encode SCI measurements into 3D representations using NeRF and 3D Gaussian Splatting, respectively, while jointly optimizing scene content and camera poses under the SCI formation model. SCISplat further achieves real-time rendering by employing a differentiable Gaussian rasterization and a robust MCMC-based densification strategy, significantly outperforming state-of-the-art SCI decoders on synthetic and real data in both image reconstruction and novel-view synthesis. The results demonstrate the practicality of 3D-aware SCI decoding for high-frame-rate, multi-view-consistent rendering with potential privacy-preserving and storage-efficient advantages in practical deployments.

Abstract

In this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene structure from a single temporal compressed image. SCI is a cost-effective method that enables the recording of high-dimensional data, such as hyperspectral or temporal information, into a single image using low-cost 2D imaging sensors. To achieve this, a series of specially designed 2D masks are usually employed, reducing storage and transmission requirements and offering potential privacy protection. Inspired by this, we take one step further to recover the encoded 3D scene information leveraging powerful 3D scene representation capabilities of neural radiance fields (NeRF). Specifically, we propose SCINeRF, in which we formulate the physical imaging process of SCI as part of the training of NeRF, allowing us to exploit its impressive performance in capturing complex scene structures. In addition, we further integrate the popular 3D Gaussian Splatting (3DGS) framework and propose SCISplat to improve 3D scene reconstruction quality and training/rendering speed by explicitly optimizing point clouds into 3D Gaussian representations. To assess the effectiveness of our method, we conduct extensive evaluations using both synthetic data and real data captured by our SCI system. Experimental results demonstrate that our proposed approach surpasses the state-of-the-art methods in terms of image reconstruction and novel view synthesis. Moreover, our method also exhibits the ability to render high frame-rate multi-view consistent images in real time by leveraging SCI and the rendering capabilities of 3DGS. Codes will be available at: https://github.com/WU- CVGL/SCISplat.
Paper Structure (29 sections, 17 equations, 9 figures, 8 tables)

This paper contains 29 sections, 17 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Given a single snapshot compressive image, our method can recover its underlying 3D scene representation. Leveraging the strong 3D scene representation and novel-view image synthesis capabilities of NeRF or 3DGS, we can recover high-quality multi-view consistent images from the single measurement.
  • Figure 2: Overview of Proposed Methods. Both methods take the real SCI measurement $\mathbf{Y}$ and modulation masks $\mathcal{M}$ as input to recover the compressed images and the underlying 3D scene structure. For SCINeRF, camera poses $\mathbf{T}_i$ are constrained by a spline. The scene information, including scene volumetric density $\sigma$ and RGB color $\mathbf{c}$, is encoded in a lightweight MLP, which is then used to render the compressed multi-view images $\hat{\mathcal{X}}$ through volumetric rendering. To improve the 3D representation quality and training/rendering speed, we further propose SCISplat by incorporating 3DGS with SCI. For SCISplat, a set of degraded frames $\widetilde{\mathcal{X}}$ are first reconstructed from the real measurement $\mathbf{Y}$ and modulation masks $\mathcal{M}$ using pixel interpolation. These frames are then fed into a learning-based Structure-from-Motion (SfM) module to generate an initial coarse point cloud $\mathcal{Q}$ and estimate rough camera poses $\mathbf{T}_i$. These outputs serve as the initialization for the 3D Gaussian $g$ parameters. Subsequently, the compressed images $\hat{\mathcal{X}}$ are rendered via differentiable rasterization. In both methods, the scene representation and camera poses are jointly optimized primarily by minimizing the photometric loss between the synthesized measurement $\hat{\mathbf{Y}}$ (from the rendered multi-view images from NeRF and 3DGS) and the real SCI measurement $\mathbf{Y}$.
  • Figure 3: Experimental setup for real dataset collection. This SCI imaging system contains a CCD camera to record snapshot measurement, primary and rely lens, and a DMD to modulate input frames.
  • Figure 4: Qualitative evaluations of our methods against SOTA SCI image restoration methods on the synthetic dataset. Top to bottom shows the results for different scenes, including Airplants, Hotdog, Cozy2room, Factory, Tanabata, and Vender. The experimental results demonstrate that our SCINeRF and SCISplat achieve superior performance from a single SCI image (the far left column).
  • Figure 5: Qualitative evaluations of our methods against naive two-stage baselines. We compared the quality of synthesized novel-view images from our methods against that of vanilla 3DGS from SOTA methods. Top to bottom shows different scenes. The qualitative comparisons demonstrate that our methods outperforms existing approaches.
  • ...and 4 more figures