Table of Contents
Fetching ...

Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting

Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song, Wenming Yang

TL;DR

The paper addresses appearance variations in Gaussian Splatting that cause floaters and color distortions across views. It introduces DAVIGS, a decoupled appearance modeling method that uses a global appearance embedding per view and 3D-consistent local features from multi-resolution hash grids to generate per-pixel affine transformations $ ext{M}(p) \,\in\, \mathbb{R}^{3\times4}$ applied to rendered colors, thereby achieving 3D-consistent appearance changes without coupling to GS during rendering. Optimization combines image-space losses with a transformation regularizer $ ext{L}_{ ext{ID}}$ to keep transformations near identity, and a cell-based querying scheme reduces computation. DAVIGS is plug-and-play for multiple Gaussian Splatting baselines, delivering state-of-the-art rendering quality with minimal training time and VRAM, and improving 3D consistency on challenging appearance-variant scenes such as GLAV and PhotoTourism-like data.

Abstract

Gaussian Splatting has emerged as a prominent 3D representation in novel view synthesis, but it still suffers from appearance variations, which are caused by various factors, such as modern camera ISPs, different time of day, weather conditions, and local light changes. These variations can lead to floaters and color distortions in the rendered images/videos. Recent appearance modeling approaches in Gaussian Splatting are either tightly coupled with the rendering process, hindering real-time rendering, or they only account for mild global variations, performing poorly in scenes with local light changes. In this paper, we propose DAVIGS, a method that decouples appearance variations in a plug-and-play and efficient manner. By transforming the rendering results at the image level instead of the Gaussian level, our approach can model appearance variations with minimal optimization time and memory overhead. Furthermore, our method gathers appearance-related information in 3D space to transform the rendered images, thus building 3D consistency across views implicitly. We validate our method on several appearance-variant scenes, and demonstrate that it achieves state-of-the-art rendering quality with minimal training time and memory usage, without compromising rendering speeds. Additionally, it provides performance improvements for different Gaussian Splatting baselines in a plug-and-play manner.

Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting

TL;DR

The paper addresses appearance variations in Gaussian Splatting that cause floaters and color distortions across views. It introduces DAVIGS, a decoupled appearance modeling method that uses a global appearance embedding per view and 3D-consistent local features from multi-resolution hash grids to generate per-pixel affine transformations applied to rendered colors, thereby achieving 3D-consistent appearance changes without coupling to GS during rendering. Optimization combines image-space losses with a transformation regularizer to keep transformations near identity, and a cell-based querying scheme reduces computation. DAVIGS is plug-and-play for multiple Gaussian Splatting baselines, delivering state-of-the-art rendering quality with minimal training time and VRAM, and improving 3D consistency on challenging appearance-variant scenes such as GLAV and PhotoTourism-like data.

Abstract

Gaussian Splatting has emerged as a prominent 3D representation in novel view synthesis, but it still suffers from appearance variations, which are caused by various factors, such as modern camera ISPs, different time of day, weather conditions, and local light changes. These variations can lead to floaters and color distortions in the rendered images/videos. Recent appearance modeling approaches in Gaussian Splatting are either tightly coupled with the rendering process, hindering real-time rendering, or they only account for mild global variations, performing poorly in scenes with local light changes. In this paper, we propose DAVIGS, a method that decouples appearance variations in a plug-and-play and efficient manner. By transforming the rendering results at the image level instead of the Gaussian level, our approach can model appearance variations with minimal optimization time and memory overhead. Furthermore, our method gathers appearance-related information in 3D space to transform the rendered images, thus building 3D consistency across views implicitly. We validate our method on several appearance-variant scenes, and demonstrate that it achieves state-of-the-art rendering quality with minimal training time and memory usage, without compromising rendering speeds. Additionally, it provides performance improvements for different Gaussian Splatting baselines in a plug-and-play manner.
Paper Structure (24 sections, 6 equations, 9 figures, 5 tables)

This paper contains 24 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: (a) There are often global and local (orange box) appearance variations in real-world captures for 3D reconstruction. (b) Such appearance variations may lead to floaters. (c) Compared to existing SOTA methods, our DAVIGS achieves advanced reconstruction results with faster optimization and rendering.
  • Figure 2: Different appearance modelings. The gray parts represent the rendering process in Gaussian Splatting, where $\mathbf{f}$ denotes the features of Gaussian primitives, and $\mathbf{c}$ denotes the colors. (a) Decoupled appearance modeling applies transformations to rendering results at the image level, which can be discarded after optimization. (b) Coupled appearance modeling addresses appearance variations by manipulating the features of Gaussian primitives, making it coupled with the rendering process.
  • Figure 3: Overall pipeline of DAVIGS. For each pixel $p$ in the rendered image $\mathcal{I}^r$, we calculate its 3D spatial position $\mathbf{x}$ by back-projecting with its depth $\mathcal{D}(p)$ in the depth map $\mathcal{D}$, and then look up the multi-resolution hash grids for its 3D consistent features $\{\mathbf{f}_i\}$. They are then concatenated with a view-dependent appearance embedding $\mathbf{l}$ and fed into an MLP $f$ to obtain a transformation matrix $\mathcal{M}(p) \in \mathbb{R}^{3\times 4}$, which is used to perform affine transformation on the color $\mathcal{I}^r(p)$ to obtain $\mathcal{I}^t(p)$. The losses $\mathcal{L}_1$ and $\mathcal{L}_\text{D-SSIM}$ are calculated between the transformed image $\mathcal{I}^t$ and the ground truth image $\mathcal{I}$. A regularization term $\mathcal{L}_\mathrm{ID}$ is applied to $\mathcal{M}(p)$ for constraining it close to the identity transformation matrix $\mathcal{M} _{\mathrm{ID}}$.
  • Figure 4: Cell-based query. (a) We divide the depth map $\mathcal{D}$ into non-overlapping cells. For cell $\mathcal{C}_{i, j}$ with mean depth $\mathcal{D}_{i,j}$, we perform back-projection on its center $\mathbf{o}_{i,j}$ to obtain spatial coordinate $\mathbf{x}_{i,j}$. We use $\mathbf{x}_{i,j}$ to query the appearance module to obtain the transformation matrix $\mathcal{M}_{i,j}$. (b) For pixel $p$, we transform its color with the bilinearly interpolated affine transformation matrix.
  • Figure 5: Qualitative comparison between DAVIGS and previous work. Floaters and other artifacts are pointed out by arrows.
  • ...and 4 more figures