Table of Contents
Fetching ...

GFix: Perceptually Enhanced Gaussian Splatting Video Compression

Siyue Teng, Ge Gao, Duolikun Danier, Yuxuan Jiang, Fan Zhang, Thomas Davis, Zoe Liu, David Bull

TL;DR

The paper tackles perceptual degradation in 3D Gaussian Splatting (3DGS) video codecs by leveraging diffusion priors through the Noise–Artifact Alignment principle. It introduces GFix, a perceptual enhancement framework that performs single-step, adaptive diffusion denoising guided by a learnable stepsize and a compact prompt, together with a modulated LoRA (mLoRA) adapter to enable efficient adaptation with minimal bitrate. The authors validate artifact–noise alignment on the UVG dataset and demonstrate substantial perceptual improvements, achieving up to $72.1\%$ BD-rate savings in LPIPS and $21.4\%$ in FID relative to GSVC, with competitive VMAF gains. The approach yields strong perceptual quality improvements with a highly compressed update stream, suggesting practical benefits for real-time or streaming scenarios and setting the stage for future GOP-based or super-resolution extensions.

Abstract

3D Gaussian Splatting (3DGS) enhances 3D scene reconstruction through explicit representation and fast rendering, demonstrating potential benefits for various low-level vision tasks, including video compression. However, existing 3DGS-based video codecs generally exhibit more noticeable visual artifacts and relatively low compression ratios. In this paper, we specifically target the perceptual enhancement of 3DGS-based video compression, based on the assumption that artifacts from 3DGS rendering and quantization resemble noisy latents sampled during diffusion training. Building on this premise, we propose a content-adaptive framework, GFix, comprising a streamlined, single-step diffusion model that serves as an off-the-shelf neural enhancer. Moreover, to increase compression efficiency, We propose a modulated LoRA scheme that freezes the low-rank decompositions and modulates the intermediate hidden states, thereby achieving efficient adaptation of the diffusion backbone with highly compressible updates. Experimental results show that GFix delivers strong perceptual quality enhancement, outperforming GSVC with up to 72.1% BD-rate savings in LPIPS and 21.4% in FID.

GFix: Perceptually Enhanced Gaussian Splatting Video Compression

TL;DR

The paper tackles perceptual degradation in 3D Gaussian Splatting (3DGS) video codecs by leveraging diffusion priors through the Noise–Artifact Alignment principle. It introduces GFix, a perceptual enhancement framework that performs single-step, adaptive diffusion denoising guided by a learnable stepsize and a compact prompt, together with a modulated LoRA (mLoRA) adapter to enable efficient adaptation with minimal bitrate. The authors validate artifact–noise alignment on the UVG dataset and demonstrate substantial perceptual improvements, achieving up to BD-rate savings in LPIPS and in FID relative to GSVC, with competitive VMAF gains. The approach yields strong perceptual quality improvements with a highly compressed update stream, suggesting practical benefits for real-time or streaming scenarios and setting the stage for future GOP-based or super-resolution extensions.

Abstract

3D Gaussian Splatting (3DGS) enhances 3D scene reconstruction through explicit representation and fast rendering, demonstrating potential benefits for various low-level vision tasks, including video compression. However, existing 3DGS-based video codecs generally exhibit more noticeable visual artifacts and relatively low compression ratios. In this paper, we specifically target the perceptual enhancement of 3DGS-based video compression, based on the assumption that artifacts from 3DGS rendering and quantization resemble noisy latents sampled during diffusion training. Building on this premise, we propose a content-adaptive framework, GFix, comprising a streamlined, single-step diffusion model that serves as an off-the-shelf neural enhancer. Moreover, to increase compression efficiency, We propose a modulated LoRA scheme that freezes the low-rank decompositions and modulates the intermediate hidden states, thereby achieving efficient adaptation of the diffusion backbone with highly compressible updates. Experimental results show that GFix delivers strong perceptual quality enhancement, outperforming GSVC with up to 72.1% BD-rate savings in LPIPS and 21.4% in FID.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: (Left) Illustration of learnable stepsize. (Right) Average MMD between Gaussian compression artifacts (different compression ratios) and partially noisy images.
  • Figure 2: Single-step denoising results at varying noise levels ($\Delta \tau$), with visual comparisons (top) and quantitative metrics (bottom).
  • Figure 3: (Left) GFix framwork overview. During decoding, the bitstream is decoded by arithmetic decoding, restoring the reconstructed content of GSVC and the quantized modulation map (based on rounding during inference) $\hat{\mathbf{M}}$. (Right) mLoRA construction.
  • Figure 4: Average rate-quality curves on the UVG dataset. We notice a difference to the reported values of GSVC, which can be attributed to the much shorter sequence length used for evaluation (first 96 frames vs. 600 frames in GSVC).
  • Figure 5: Visual comparisons of NeRV, GSVC, and the proposed GFix.