Table of Contents
Fetching ...

VeGaS: Video Gaussian Splatting

Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek

TL;DR

The Video Gaussian Splatting (VeGaS) model is introduced, which enables realistic modifications of video data and outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data.

Abstract

Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.

VeGaS: Video Gaussian Splatting

TL;DR

The Video Gaussian Splatting (VeGaS) model is introduced, which enables realistic modifications of video data and outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data.

Abstract

Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.

Paper Structure

This paper contains 18 sections, 26 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Graphical summary of our Video Gaussian Splatting (VeGaS) model. The initial step involves the use of diagonal 3D Gaussians and frames with equal distances. Then, dynamic frame fitting and Gaussian folding are employed to approximate nonlinear structures within a video stream. Each frame is modeled by 2D Gaussians obtained by conditioning of 3D Folded-Gaussians at frame occurrence time $t_i$. This representation allows for the creation of high-quality renderings of video data and facilitates a wide range of modifications.
  • Figure 2: Video edition. Note that VeGaS enables modification of selected objects on a global scale, including operations such as multiplication and scaling. The model was trained on the DAVIS dataset davis.
  • Figure 3: Folded-Gaussian distribution is capable of capturing both linear and nonlinear patterns. It is crucial to highlight that the conditional distributions (marked in red) are classical Gaussians.
  • Figure 4: Video edition. Note that VeGaS permits selection of a single frame and modification of some of its elements. The model was trained on the DAVIS dataset davis.
  • Figure 5: Frame interpolation. Qualitative results obtained by VeGaS and VGR sun2024splatter on a selected video object from the DAVIS dataset davis. Frames at times $t$ and $t+1$ are reconstructions of two consecutive original frames, while frames at times $t+1/4$, $t+2/4$, and $t+3/4$ are interpolated. Note that VeGaS produces outcomes that are slightly more favorable.
  • ...and 1 more figures