Table of Contents
Fetching ...

CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video

Hojun Song, Heejung Choi, Aro Kim, Chae-yeong Song, Gahyeon Kim, Soo Ye Kim, Jaehyup Lee, Sang-hyo Park

TL;DR

CompSplat tackles the challenge of novel view synthesis from long, unposed, real-world videos that are typically compressed. It introduces a compression-aware optimization framework that jointly models per-frame codec characteristics and training stability cues, using framewise confidence $q_t = q_t^q + q_t^b$ with $q_t^q = \lambda^q (Q_{max}^f - Q_t^f)/(Q_{max}^f - Q_{min}^f + \varepsilon)$ and $q_t^b = \lambda^b (B_t^f - B_{min}^f)/(B_{max}^f - B_{min}^f + \varepsilon)$, plus EMA smoothing $\bar{q}_t$. The method introduces Quality-guided Density Control to adapt Gaussian densification and pruning via thresholds $\theta_t$ and $\omega'_t$, and a Quality Gap-aware Masking mechanism that down-weights photometric supervision on views with poor feature matches using an inlier ratio $r_t = I_t/(K_t + \varepsilon)$ and drop rate $d_t = \eta (1 - r_t)$. Experiments on Tanks and Temples, Free, and Hike show state-of-the-art rendering fidelity and pose accuracy under heavy compression, demonstrating practical robustness for real-world, bandwidth-constrained video capture. By explicitly accounting for codec-induced quality variations, CompSplat advances the feasibility of compression-aware 3D Gaussian Splatting for long unposed video reconstruction, enabling more reliable digital twins and immersive experiences in realistic conditions.

Abstract

High-quality novel view synthesis (NVS) from real-world videos is crucial for applications such as cultural heritage preservation, digital twins, and immersive media. However, real-world videos typically contain long sequences with irregular camera trajectories and unknown poses, leading to pose drift, feature misalignment, and geometric distortion during reconstruction. Moreover, lossy compression amplifies these issues by introducing inconsistencies that gradually degrade geometry and rendering quality. While recent studies have addressed either long-sequence NVS or unposed reconstruction, compression-aware approaches still focus on specific artifacts or limited scenarios, leaving diverse compression patterns in long videos insufficiently explored. In this paper, we propose CompSplat, a compression-aware training framework that explicitly models frame-wise compression characteristics to mitigate inter-frame inconsistency and accumulated geometric errors. CompSplat incorporates compression-aware frame weighting and an adaptive pruning strategy to enhance robustness and geometric consistency, particularly under heavy compression. Extensive experiments on challenging benchmarks, including Tanks and Temples, Free, and Hike, demonstrate that CompSplat achieves state-of-the-art rendering quality and pose accuracy, significantly surpassing most recent state-of-the-art NVS approaches under severe compression conditions.

CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video

TL;DR

CompSplat tackles the challenge of novel view synthesis from long, unposed, real-world videos that are typically compressed. It introduces a compression-aware optimization framework that jointly models per-frame codec characteristics and training stability cues, using framewise confidence with and , plus EMA smoothing . The method introduces Quality-guided Density Control to adapt Gaussian densification and pruning via thresholds and , and a Quality Gap-aware Masking mechanism that down-weights photometric supervision on views with poor feature matches using an inlier ratio and drop rate . Experiments on Tanks and Temples, Free, and Hike show state-of-the-art rendering fidelity and pose accuracy under heavy compression, demonstrating practical robustness for real-world, bandwidth-constrained video capture. By explicitly accounting for codec-induced quality variations, CompSplat advances the feasibility of compression-aware 3D Gaussian Splatting for long unposed video reconstruction, enabling more reliable digital twins and immersive experiences in realistic conditions.

Abstract

High-quality novel view synthesis (NVS) from real-world videos is crucial for applications such as cultural heritage preservation, digital twins, and immersive media. However, real-world videos typically contain long sequences with irregular camera trajectories and unknown poses, leading to pose drift, feature misalignment, and geometric distortion during reconstruction. Moreover, lossy compression amplifies these issues by introducing inconsistencies that gradually degrade geometry and rendering quality. While recent studies have addressed either long-sequence NVS or unposed reconstruction, compression-aware approaches still focus on specific artifacts or limited scenarios, leaving diverse compression patterns in long videos insufficiently explored. In this paper, we propose CompSplat, a compression-aware training framework that explicitly models frame-wise compression characteristics to mitigate inter-frame inconsistency and accumulated geometric errors. CompSplat incorporates compression-aware frame weighting and an adaptive pruning strategy to enhance robustness and geometric consistency, particularly under heavy compression. Extensive experiments on challenging benchmarks, including Tanks and Temples, Free, and Hike, demonstrate that CompSplat achieves state-of-the-art rendering quality and pose accuracy, significantly surpassing most recent state-of-the-art NVS approaches under severe compression conditions.
Paper Structure (41 sections, 13 equations, 17 figures, 21 tables)

This paper contains 41 sections, 13 equations, 17 figures, 21 tables.

Figures (17)

  • Figure 1: CompSplat achieves high-quality novel view synthesis from real-world compressed videos. Given (a) compressed video input, our approach leverages (b) compression information showing per-frame quality variations from different quantization parameters. Due to degraded inputs from compression, previous methods (c) NoPe-NeRF, (d) LocalRF, and (e) LongSplat generate blurry or distorted results. In contrast, through compression-aware optimization, (f) our proposed method produces clear reconstructions with fine details.
  • Figure 2: Overview of the CompSplat pipeline: (a) Our approach builds upon an unposed-GS framework, reconstructing a 3D Gaussian scene from compressed videos through incremental pose estimation and optimization. (b) Frame-wise compression information (QP and bitrates) is converted into a confidence score. (c) We introduce Quality-guided Density Control, which regulates Gaussian optimization based on frame reliability: (c.1) Scale-based pruning removes over-diffused Gaussians that primarily arise in low-quality frames by leveraging frame confidence. (c.2) Adaptive Densification and Pruning adjust densification gradient and pruning opacity thresholds based on frame confidence. (d) Quality Gap-aware Masking mitigates frame-to-frame quality differences by applying a gap ratio–based pixel mask.
  • Figure 3: Frame-wise compression analysis. During video compression, each frame is encoded with QP values, leading to substantial inter-frame variations in PSNR and bitrate. This non-uniformity becomes more pronounced in long real-world videos, highlighting the need to explicitly consider frame-wise compression artifacts when applying 3DGS to compressed video.
  • Figure 4: Qualitative comparison on the Free dataset free. We compare our method against CF-3DGS cf3dgs, NoPe-NeRF nopenerf, LocalRF localrf, and LongSplat longsplat. CF-3DGS produces highly diffused or incorrect Gaussians on compressed datasets, making object and scene structures difficult to recognize. Other baseline methods also yield blurry or geometrically distorted reconstructions. LongSplat performs relatively well; however, when combined with our approach, the results exhibit sharper textures and clearer object boundaries, demonstrating improved reconstruction quality even under compressed conditions.
  • Figure 5: Visualization of camera trajectories on the Free dataset free. CF-3DGS fails to estimate a valid camera trajectory due to OOM when processing long sequences. Compared to LongSplat and other baselines, our method produces more stable and consistent camera trajectories under compressed video frames.
  • ...and 12 more figures