GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting

Inseo Lee; Youngyoon Choi; Joonseok Lee

GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting

Inseo Lee, Youngyoon Choi, Joonseok Lee

TL;DR

GaussianVideo reframes video representation through deformable 2D Gaussian Splatting, coupling a multi-plane spatiotemporal encoder with a lightweight decoder to efficiently model dynamic frames. By conditioning Gaussian deformations on time and leveraging temporal gradients for initialization, the method dramatically reduces memory and speeds up training and decoding while preserving reconstruction quality. Key contributions include the deformable 2D Gaussian framework, a multi-plane encoder for scalable high-dimensional encoding, and a temporal-gradient initialization that prioritizes dynamic regions. The approach yields substantial practical benefits for video compression and real-time rendering, achieving competitive PSNR with significantly higher throughput and lower memory compared to state-of-the-art NeRV-based methods.

Abstract

Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression, outperforming traditional codecs. As model size grows, however, slow encoding and decoding speed and high memory consumption hinder its application in practice. To address these limitations, we propose a new video representation and compression method based on 2D Gaussian Splatting to efficiently handle video data. Our proposed deformable 2D Gaussian Splatting dynamically adapts the transformation of 2D Gaussians at each frame, significantly reducing memory cost. Equipped with a multi-plane-based spatiotemporal encoder and a lightweight decoder, it predicts changes in color, coordinates, and shape of initialized Gaussians, given the time step. By leveraging temporal gradients, our model effectively captures temporal redundancy at negligible cost, significantly enhancing video representation efficiency. Our method reduces GPU memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding compared to the state-of-the-art NeRV methods.

GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting

TL;DR

Abstract

GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)