CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video
Hojun Song, Heejung Choi, Aro Kim, Chae-yeong Song, Gahyeon Kim, Soo Ye Kim, Jaehyup Lee, Sang-hyo Park
TL;DR
CompSplat tackles the challenge of novel view synthesis from long, unposed, real-world videos that are typically compressed. It introduces a compression-aware optimization framework that jointly models per-frame codec characteristics and training stability cues, using framewise confidence $q_t = q_t^q + q_t^b$ with $q_t^q = \lambda^q (Q_{max}^f - Q_t^f)/(Q_{max}^f - Q_{min}^f + \varepsilon)$ and $q_t^b = \lambda^b (B_t^f - B_{min}^f)/(B_{max}^f - B_{min}^f + \varepsilon)$, plus EMA smoothing $\bar{q}_t$. The method introduces Quality-guided Density Control to adapt Gaussian densification and pruning via thresholds $\theta_t$ and $\omega'_t$, and a Quality Gap-aware Masking mechanism that down-weights photometric supervision on views with poor feature matches using an inlier ratio $r_t = I_t/(K_t + \varepsilon)$ and drop rate $d_t = \eta (1 - r_t)$. Experiments on Tanks and Temples, Free, and Hike show state-of-the-art rendering fidelity and pose accuracy under heavy compression, demonstrating practical robustness for real-world, bandwidth-constrained video capture. By explicitly accounting for codec-induced quality variations, CompSplat advances the feasibility of compression-aware 3D Gaussian Splatting for long unposed video reconstruction, enabling more reliable digital twins and immersive experiences in realistic conditions.
Abstract
High-quality novel view synthesis (NVS) from real-world videos is crucial for applications such as cultural heritage preservation, digital twins, and immersive media. However, real-world videos typically contain long sequences with irregular camera trajectories and unknown poses, leading to pose drift, feature misalignment, and geometric distortion during reconstruction. Moreover, lossy compression amplifies these issues by introducing inconsistencies that gradually degrade geometry and rendering quality. While recent studies have addressed either long-sequence NVS or unposed reconstruction, compression-aware approaches still focus on specific artifacts or limited scenarios, leaving diverse compression patterns in long videos insufficiently explored. In this paper, we propose CompSplat, a compression-aware training framework that explicitly models frame-wise compression characteristics to mitigate inter-frame inconsistency and accumulated geometric errors. CompSplat incorporates compression-aware frame weighting and an adaptive pruning strategy to enhance robustness and geometric consistency, particularly under heavy compression. Extensive experiments on challenging benchmarks, including Tanks and Temples, Free, and Hike, demonstrate that CompSplat achieves state-of-the-art rendering quality and pose accuracy, significantly surpassing most recent state-of-the-art NVS approaches under severe compression conditions.
