Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

Mingyang Xie; Haoming Cai; Sachin Shah; Yiran Xu; Brandon Y. Feng; Jia-Bin Huang; Christopher A. Metzler

Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

Mingyang Xie, Haoming Cai, Sachin Shah, Yiran Xu, Brandon Y. Feng, Jia-Bin Huang, Christopher A. Metzler

TL;DR

The key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods allow one to perform flash/no-flash reflection separation using unpaired measurements -- this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods.

Abstract

We introduce a simple yet effective approach for separating transmitted and reflected light. Our key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods (e.g.,~3D Gaussian splatting) allow one to perform flash/no-flash reflection separation using unpaired measurements -- this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods. Through extensive real-world experiments, we demonstrate our method, Flash-Splat, accurately reconstructs both transmitted and reflected scenes in 3D. Our method outperforms existing 3D reflection separation methods, which do not leverage illumination control, by a large margin. Our project webpage is at https://flash-splat.github.io/.

Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 17 figures, 1 table)

This paper contains 21 sections, 8 equations, 17 figures, 1 table.

Introduction
Related work
Method
Paired 2D Flash/No-Flash
Unpaired 3D Flash/No-Flash
Proposed Pipeline for 3D Reflection Separation
Experimental Details
Dataset Collection
Baselines
Architecture and Optimization details.
Results
Ablation Studies
With and Without Flash Cues
Replacing 3DGS with NeRF
Discussions & Limitations
...and 6 more sections

Figures (17)

Figure 1: Left: We separate the 3D transmitted and reflected scenes by capturing some views with camera flash and some views with no flash. Right: Our proposed Flash-Splat method achieves much better separation than the state-of-the-art unsupervised 3D separation method NeRFReN guo2022nerfren.
Figure 2: Flash/No-Flash For Reflection Removal. The difference between paired flash and no-flash images is equivalent to taking a photo with flash in a dark environment, which gives us a reflection-free image (top). This is because flash increases the transmission brightness, but not the reflection brightness. Notice pairs must be tightly aligned for this method to work. Even tiny vibrations such as pressing the shutter button even when using a tripod produce artifacts (bottom).
Figure 3: Our Intuition: Construct 2D and 3D "pseudo-pairs" as Cues for Reflection Removal. Flash-Splat does not require paired flash/no-flash data. During the data capture stage, we collect unpaired flash/no-flash images from different views. In (a), if we captured a no-flash image at View 2, we can learn a 3D representation of the captured flash images at other views, and then synthesize a novel view of the flash image at View 2. As such, we have created a 2D pseudo-pair of flash and no-flash images at View 2. If we then take the difference between the pseudo-pair as in Figure \ref{['fig:motivation']}, we get the transmission component of View 2 that is free of reflection. In (b), we reconstruct a 3D scene with flash using only the flash images (top); we also reconstruct a 3D scene without flash using only the no-flash images (bottom). As such, we have created a 3D pseudo-pair of flash/no-flash scenes.
Figure 4: Method Overview. We use 4 3DGSs kerbl20233dgs as our 3D representations for the transmitted scene with flash $\mathbf{T}_{F}$, the transmitted scene with no flash $\mathbf{T}_{N}$, the reflected scene $\mathbf{R}$ and the reflective fraction map $\beta$. Based on the Flash/No-flash technique, $\mathbf{R}$ and $\beta$ are shared between the flash image and the no-flash image, while the relationship of $\mathbf{T}_{F}$ and $\mathbf{T}_{N}$ is close to linear. We initialize these 4 3DGSs using cues from the 3D pseudo-pair (see Figure \ref{['fig:overview']}b and Section \ref{['subsubsec: init_3d']}). In each iteration of optimization, our method operates on a single view. This figure, for instance, shows a view where we captured a flash image. There is NO no-flash image captured at this view. As shown in the top row, we use $\mathbf{T}_{F}$, $\mathbf{R}$, and $\beta$ to render a flash image of this particular view and calculate losses with the captured ground truth flash image. Additionally, based on the cues from 2D pseudo-pairs, we calculate the Pearson linearity loss between $\mathbf{T}_{F}$ and $\mathbf{T}_{N}$ to encourage the linearity between them (see Figure \ref{['fig:overview']}a and Section \ref{['subsubsec: linear_2d']}). We then back-propagate the gradients and update the weights of the 4 3DGSs.
Figure 5: The Office scene. Top, middle, and bottom rows are the captured images, separated transmissions, and separated reflections, respectively. Our reflection separation approach is far more effective than NeRFReN guo2022nerfren and Dong et al dong2021location.
...and 12 more figures

Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

TL;DR

Abstract

Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

Authors

TL;DR

Abstract

Table of Contents

Figures (17)