SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang
TL;DR
This work targets robust novel view synthesis under out-of-distribution camera angles by introducing SplatFormer, a point-transformer that refines an initial 3D Gaussian Splatting (3DGS) representation in one forward pass. The model learns 3D priors from large-scale ShapeNet and Objaverse datasets and is trained with a 2D rendering loss that combines $\mathcal{L}_1$ and perceptual components, $\mathcal{L}_{\text{LPIPS}}$, over both in-distribution and OOD views. A new evaluation protocol, OOD-NVS, reveals that prior methods struggle with extreme viewpoint deviations, while SplatFormer achieves state-of-the-art fidelity and 3D consistency in both synthetic and real-world cross-dataset scenarios. The results emphasize the efficacy of applying a 3D point-transformer to Gaussian splats and highlight the practical impact for immersive AR/VR rendering where unseen viewpoints are common. Overall, the work demonstrates that data-driven priors and 3D-consistent refinement via transformers can substantially improve OOD renderings while maintaining real-time capabilities.
Abstract
3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on Gaussian splats. SplatFormer takes as input an initial 3DGS set optimized under limited training views and refines it in a single forward pass, effectively removing potential artifacts in OOD test views. To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference. Our model significantly improves rendering quality under extreme novel views, achieving state-of-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.
