Table of Contents
Fetching ...

ViewSplat: View-Adaptive Dynamic Gaussian Splatting for Feed-Forward Synthesis

Moonyeon Jeong, Seunggi Min, Suhyeon Lee, Hongje Seong

Abstract

We present ViewSplat, a view-adaptive 3D Gaussian splatting network for novel view synthesis from unposed images. While recent feed-forward 3D Gaussian splatting has significantly accelerated 3D scene reconstruction by bypassing per-scene optimization, a fundamental fidelity gap remains. We attribute this bottleneck to the limited capacity of single-step feed-forward networks to regress static Gaussian primitives that satisfy all viewpoints. To address this limitation, we shift the paradigm from static primitive regression to view-adaptive dynamic splatting. Instead of a rigid Gaussian representation, our pipeline learns a view-adaptable latent representation. Specifically, ViewSplat initially predicts base Gaussian primitives alongside the weights of dynamic MLPs. During rendering, these MLPs take target view coordinates as input and predict view-dependent residual updates for each Gaussian attribute (i.e., 3D position, scale, rotation, opacity, and color). This mechanism, which we term view-adaptive dynamic splatting, allows each primitive to rectify initial estimation errors, effectively capturing high-fidelity appearances. Extensive experiments demonstrate that ViewSplat achieves state-of-the-art fidelity while maintaining fast inference (17 FPS) and real-time rendering (154 FPS).

ViewSplat: View-Adaptive Dynamic Gaussian Splatting for Feed-Forward Synthesis

Abstract

We present ViewSplat, a view-adaptive 3D Gaussian splatting network for novel view synthesis from unposed images. While recent feed-forward 3D Gaussian splatting has significantly accelerated 3D scene reconstruction by bypassing per-scene optimization, a fundamental fidelity gap remains. We attribute this bottleneck to the limited capacity of single-step feed-forward networks to regress static Gaussian primitives that satisfy all viewpoints. To address this limitation, we shift the paradigm from static primitive regression to view-adaptive dynamic splatting. Instead of a rigid Gaussian representation, our pipeline learns a view-adaptable latent representation. Specifically, ViewSplat initially predicts base Gaussian primitives alongside the weights of dynamic MLPs. During rendering, these MLPs take target view coordinates as input and predict view-dependent residual updates for each Gaussian attribute (i.e., 3D position, scale, rotation, opacity, and color). This mechanism, which we term view-adaptive dynamic splatting, allows each primitive to rectify initial estimation errors, effectively capturing high-fidelity appearances. Extensive experiments demonstrate that ViewSplat achieves state-of-the-art fidelity while maintaining fast inference (17 FPS) and real-time rendering (154 FPS).

Paper Structure

This paper contains 44 sections, 11 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Given unposed images as input, the feed-forward network reconstructs 3D Gaussians. While (a) rendering from static 3D Gaussians often suffers from blurred details, (b) our method introduces view-adaptive Gaussian updates conditioned on the target pose to refine the Gaussians dynamically. As shown in the bottom panels, our approach improves the reconstruction of fine-grained details (, sharp edges and specular reflections) compared to SPFSplat huang2025spfsplat.
  • Figure 2: Architecture of ViewSplat. Built upon a shared geometry transformer backbone leroy2024mast3rwang2025vggt, our framework simultaneously predicts canonical 3D Gaussians and camera poses from unposed images using Gaussian heads and a pose head. To accurately capture view-dependent effects, a view-dependent head dynamically generates per-pixel view MLPs, which takes the target pose as input to predict residual offsets for Gaussians. These offsets are then applied to refine the canonical Gaussians during rendering.
  • Figure 3: Qualitative comparison on RE10K. Compared to baseline methods, our approach accurately reconstructs sharp reflections.
  • Figure 4: Qualitative results of cross-dataset generalization.
  • Figure 5: Artifacts from decoupled $\mu/\alpha$ refinement.
  • ...and 6 more figures