Table of Contents
Fetching ...

ReSplat: Learning Recurrent Gaussian Splats

Haofei Xu, Daniel Barath, Andreas Geiger, Marc Pollefeys

TL;DR

ReSplat introduces a gradient-free recurrent refinement framework for 3D Gaussian splatting. It initializes a compact, $16\times$ subsampled Gaussian set and uses rendering-error feedback to iteratively update Gaussians, enabling robust generalization across datasets, view counts, and resolutions. The approach achieves state-of-the-art view synthesis with substantially fewer Gaussians and faster rendering on DL3DV, RealEstate10K, and ACID, demonstrating the practicality of gradient-free recurrent refinement for 3D scene representations.

Abstract

While feed-forward Gaussian splatting models offer computational efficiency and can generalize to sparse input settings, their performance is fundamentally constrained by relying on a single forward pass for inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that iteratively refines 3D Gaussians without explicitly computing gradients. Our key insight is that the Gaussian splatting rendering error serves as a rich feedback signal, guiding the recurrent network to learn effective Gaussian updates. This feedback signal naturally adapts to unseen data distributions at test time, enabling robust generalization across datasets, view counts and image resolutions. To initialize the recurrent process, we introduce a compact reconstruction model that operates in a $16 \times$ subsampled space, producing $16 \times$ fewer Gaussians than previous per-pixel Gaussian models. This substantially reduces computational overhead and allows for efficient Gaussian updates. Extensive experiments across varying of input views (2, 8, 16, 32), resolutions ($256 \times 256$ to $540 \times 960$), and datasets (DL3DV, RealEstate10K and ACID) demonstrate that our method achieves state-of-the-art performance while significantly reducing the number of Gaussians and improving the rendering speed. Our project page is at https://haofeixu.github.io/resplat/.

ReSplat: Learning Recurrent Gaussian Splats

TL;DR

ReSplat introduces a gradient-free recurrent refinement framework for 3D Gaussian splatting. It initializes a compact, subsampled Gaussian set and uses rendering-error feedback to iteratively update Gaussians, enabling robust generalization across datasets, view counts, and resolutions. The approach achieves state-of-the-art view synthesis with substantially fewer Gaussians and faster rendering on DL3DV, RealEstate10K, and ACID, demonstrating the practicality of gradient-free recurrent refinement for 3D scene representations.

Abstract

While feed-forward Gaussian splatting models offer computational efficiency and can generalize to sparse input settings, their performance is fundamentally constrained by relying on a single forward pass for inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that iteratively refines 3D Gaussians without explicitly computing gradients. Our key insight is that the Gaussian splatting rendering error serves as a rich feedback signal, guiding the recurrent network to learn effective Gaussian updates. This feedback signal naturally adapts to unseen data distributions at test time, enabling robust generalization across datasets, view counts and image resolutions. To initialize the recurrent process, we introduce a compact reconstruction model that operates in a subsampled space, producing fewer Gaussians than previous per-pixel Gaussian models. This substantially reduces computational overhead and allows for efficient Gaussian updates. Extensive experiments across varying of input views (2, 8, 16, 32), resolutions ( to ), and datasets (DL3DV, RealEstate10K and ACID) demonstrate that our method achieves state-of-the-art performance while significantly reducing the number of Gaussians and improving the rendering speed. Our project page is at https://haofeixu.github.io/resplat/.

Paper Structure

This paper contains 14 sections, 9 equations, 15 figures, 13 tables.

Figures (15)

  • Figure 1: Learning recurrent Gaussian splats in a feed-forward manner. We propose ReSplat, a feed-forward recurrent network that iteratively refines 3D Gaussian splats to improve sparse view settings where optimization-based 3DGS Kerbl2023TOG struggles. As initialization (iteration 0), we introduce a compact reconstruction model that predicts Gaussians in a $16\times$ subsampled space, producing $16\times$ fewer Gaussians and $4\times$ faster rendering than per-pixel MVSplat chen2024mvsplat and DepthSplat xu2025depthsplat. The reduced number of Gaussians makes subsequent refinement efficient. Compared to the optimization-based 3DGS, ReSplat is $100\times$ faster thanks to its feed-forward design, while still benefiting from iterative updates. Here we show results for 8 input views ($512 \times 960$ resolution) on DL3DV dataset; see \ref{['tab:highres_8view_dl3dv']} for detailed metrics.
  • Figure 2: Learning to recurrently update 3D Gaussians. Given $N$ posed input images, we first predict per-view depth maps at $1/4$ resolution and then unproject and transform them to a point cloud with image features $\{({\bm p}_j, {{\bm f}}_j) \}_{j=1}^{M}$, where $M = N \times \frac{HW}{16}$ is the number of points. We then reconstruct an initial set of 3D Gaussians $\{ ({\bm g}^0_j, {\bm z}^0_j) \}_{j=1}^M$ in a $16\times$ subsampled 3D space with a kNN and global attention-based Gaussian regressor. Next, we learn to refine the initial Gaussians recurrently. At each recurrent step $t$, we use the current Gaussian prediction to render input views and then compute the rendering errors $\hat{\bm E}^t$ between rendered and ground-truth input views. A global attention is next applied on the rendering error to propagate the rendering errors to the 3D Gaussians. A kNN attention-based update module next takes as input the concatenation of current Gaussian parameters ${\bm g}^t_j$, the hidden state ${\bm z}^t_j$, and the rendering error ${\bm e}^t_j$, and predicts the incremental updates $\Delta {\bm g}^t_j$ and $\Delta {\bm z}^t_j$. We iterate this process until a total number of $T$ steps.
  • Figure 3: View synthesis on DL3DV. Our ReSplat outperforms both optimization and feed-forward methods, with significantly smaller rendering errors. More samples are presented in \ref{['fig:vis_compare_supp']} (appendix).
  • Figure 3: Evaluation of two input views ($256 \times 256$) on RealEstate10K. ReSplat outperforms prior feed-forward 3DGS models and matches LVSM's quality with $20\times$ faster rendering.
  • Figure 4: Optimization-based vs. feed-forward refinement. Starting from the same ReSplat's initialization, we compare 3DGS optimization-based refinement with our feed-forward approach. Our ReSplat improves the rendering quality significantly faster (4 vs.80 iterations) and is $53\times$ faster in terms of the reconstruction speed. See \ref{['tab:resplat_init_comparison']} (appendix) for detailed numbers.
  • ...and 10 more figures