Table of Contents
Fetching ...

SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image

Zixu Wang, Hao Yang, Yu Guo, Fei Wang

TL;DR

The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes and outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image.

Abstract

Snapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes, and 2) NeRF-based reconstruction methods still face limitations in handling dynamic scenes. To address these challenges, we propose SCIGS, a variant of 3DGS, and develop a primitive-level transformation network that utilizes camera pose stamps and Gaussian primitive coordinates as embedding vectors. This approach resolves the necessity of camera pose in vanilla 3DGS and enhances multi-view 3D structural consistency in dynamic scenes by utilizing transformed primitives. Additionally, a high-frequency filter is introduced to eliminate the artifacts generated during the transformation. The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes. Experiments on both static and dynamic scenes demonstrate that SCIGS not only enhances SCI decoding but also outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image. The code will be made available upon publication.

SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image

TL;DR

The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes and outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image.

Abstract

Snapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes, and 2) NeRF-based reconstruction methods still face limitations in handling dynamic scenes. To address these challenges, we propose SCIGS, a variant of 3DGS, and develop a primitive-level transformation network that utilizes camera pose stamps and Gaussian primitive coordinates as embedding vectors. This approach resolves the necessity of camera pose in vanilla 3DGS and enhances multi-view 3D structural consistency in dynamic scenes by utilizing transformed primitives. Additionally, a high-frequency filter is introduced to eliminate the artifacts generated during the transformation. The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes. Experiments on both static and dynamic scenes demonstrate that SCIGS not only enhances SCI decoding but also outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image. The code will be made available upon publication.

Paper Structure

This paper contains 16 sections, 12 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Given a single compressed image of a dynamic scene as input, the proposed SCIGS can reconstruct a high-quality dynamic 3D scene and recover multi-view consistent images.
  • Figure 2: The pipeline of the proposed SCIGS. Given a set of randomly initialized 3D Gaussians and a camera pose, and introducing the same number of camera pose stamps as the compression ratio, our transformation network takes the Gaussian primitives and the camera pose stamps as inputs, followed by a high-frequency filter, outputs 3D Gaussians under different camera pose stamps. These camera-pose-aware transformed 3D Gaussians are then rendered to images under the given camera viewpoint, and are modulated by a given set of masks to generate compressed images.
  • Figure 3: (a) illustrates an effective Gaussian, (b) illustrates an ineffective Gaussian. (c) and (d) show how the transformation network converts ineffective Gaussians to effective Gaussians.
  • Figure 4: The illustration of the principle of the proposed high-frequency filter. The Gaussians that cause high-frequency artifacts are filtered to eliminate the artifacts.
  • Figure 5: Qualitative evaluations on the synthetic dataset compare our proposed method (SCIGS) with the SOTA SCI image method (SCINeRF). From top to bottom are two static scenes (factory and tanabata) and two dynamic scenes (roundabout and flamingo). The experiments show that our method achieves comparable image recovery performance from a single compressed image in static scenes, while demonstrating superior performance in dynamic scenes.
  • ...and 3 more figures