MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance
Kaizhe Zhang, Shinan Chen, Qian Zhao, Weizhan Zhang, Caixia Yan, Yudeng Xin
TL;DR
3D Gaussian Splatting (3DGS) struggles to render high-resolution views from low-resolution inputs. The authors propose MVGSR, a multi-view SR framework that uses camera-pose-based auxiliary view selection and an epipolar-constrained multi-view attention to fuse information across views for high-frequency detail and geometric consistency. The SR network combines multi-view features with a single-image prior and uses a sub-pixel, anti-aliased loss to supervise 3DGS rendering. Experiments across NeRF Synthetic, Tanks & Temples, and Mip-NeRF 360 demonstrate state-of-the-art performance on object-centric and scene-level 3DGS SR benchmarks, with improved cross-view consistency and detail fidelity. MVGSR enables HRNVS on arbitrarily organized multi-view data without strict temporal continuity or view ordering, offering practical benefits for real-world multi-view capture scenarios.
Abstract
Scenes reconstructed by 3D Gaussian Splatting (3DGS) trained on low-resolution (LR) images are unsuitable for high-resolution (HR) rendering. Consequently, a 3DGS super-resolution (SR) method is needed to bridge LR inputs and HR rendering. Early 3DGS SR methods rely on single-image SR networks, which lack cross-view consistency and fail to fuse complementary information across views. More recent video-based SR approaches attempt to address this limitation but require strictly sequential frames, limiting their applicability to unstructured multi-view datasets. In this work, we introduce Multi-View Consistent 3D Gaussian Splatting Super-Resolution (MVGSR), a framework that focuses on integrating multi-view information for 3DGS rendering with high-frequency details and enhanced consistency. We first propose an Auxiliary View Selection Method based on camera poses, making our method adaptable for arbitrarily organized multi-view datasets without the need of temporal continuity or data reordering. Furthermore, we introduce, for the first time, an epipolar-constrained multi-view attention mechanism into 3DGS SR, which serves as the core of our proposed multi-view SR network. This design enables the model to selectively aggregate consistent information from auxiliary views, enhancing the geometric consistency and detail fidelity of 3DGS representations. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both object-centric and scene-level 3DGS SR benchmarks.
