Table of Contents
Fetching ...

Diff3R: Feed-forward 3D Gaussian Splatting with Uncertainty-aware Differentiable Optimization

Yueh-Cheng Liu, Jozef Hladký, Matthias Nießner, Angela Dai

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) present two main directions: feed-forward models offer fast inference in sparse-view settings, while per-scene optimization yields high-quality renderings but is computationally expensive. To combine the benefits of both, we introduce Diff3R, a novel framework that explicitly bridges feed-forward prediction and test-time optimization. By incorporating a differentiable 3DGS optimization layer directly into the training loop, our network learns to predict an optimal initialization for test-time optimization rather than a conventional zero-shot result. To overcome the computational cost of backpropagating through the optimization steps, we propose computing gradients via the Implicit Function Theorem and a scalable, matrix-free PCG solver tailored for 3DGS optimization. Additionally, we incorporate a data-driven uncertainty model into the optimization process by adaptively controlling how much the parameters are allowed to change during optimization. This approach effectively mitigates overfitting in under-constrained regions and increases robustness against input outliers. Since our proposed optimization layer is model-agnostic, we show that it can be seamlessly integrated into existing feed-forward 3DGS architectures for both pose-given and pose-free methods, providing improvements for test-time optimization.

Diff3R: Feed-forward 3D Gaussian Splatting with Uncertainty-aware Differentiable Optimization

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) present two main directions: feed-forward models offer fast inference in sparse-view settings, while per-scene optimization yields high-quality renderings but is computationally expensive. To combine the benefits of both, we introduce Diff3R, a novel framework that explicitly bridges feed-forward prediction and test-time optimization. By incorporating a differentiable 3DGS optimization layer directly into the training loop, our network learns to predict an optimal initialization for test-time optimization rather than a conventional zero-shot result. To overcome the computational cost of backpropagating through the optimization steps, we propose computing gradients via the Implicit Function Theorem and a scalable, matrix-free PCG solver tailored for 3DGS optimization. Additionally, we incorporate a data-driven uncertainty model into the optimization process by adaptively controlling how much the parameters are allowed to change during optimization. This approach effectively mitigates overfitting in under-constrained regions and increases robustness against input outliers. Since our proposed optimization layer is model-agnostic, we show that it can be seamlessly integrated into existing feed-forward 3DGS architectures for both pose-given and pose-free methods, providing improvements for test-time optimization.

Paper Structure

This paper contains 23 sections, 32 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Optimization-Aware 3D Gaussian Splatting. (a) Standard feed-forward 3DGS models directly predict Gaussian parameters, frequently resulting in blurry novel views. Applying test-time optimization (TTO) to these predictions in sparse-view settings is highly unconstrained, which makes it easily overfits to the input views, trapping the solution in poor local minima with severe visual artifacts. (b) By training through the optimization process via implicit gradients, our method learns an optimization-aware initialization, tailored for 3DGS post-optimization.
  • Figure 2: Overview of our Uncertainty-Aware Differentiable 3DGS Framework. Given a sparse set of context views (with optional camera parameters), our feed-forward network predicts an initial set of 3D Gaussian parameters ($\Theta_0$). Our proposed differentiable optimization layer refines these parameters via gradient descent to yield the optimized Gaussians ($\Theta^*$). To train the network end-to-end, we introduce an efficient analytical solution for the backward pass using implicit gradients and a matrix-free PCG solver. Additionally, to make the optimization more robust in sparse-view settings, we predict learnable uncertainty weights ($\boldsymbol{\Lambda}$). These weights act as an adaptive proximal bound on the optimization trajectory, preventing the model from overfitting to the context views.
  • Figure 3: Qualitative results on RE10K dataset with pose-given setting. We compare ours with feed-forward 3DGS methods such as PixelSplat charatan2024pixelsplat, MVSplat chen2024mvsplat, and DepthSplat xu2025depthsplat with two input views. As shown in the zoom-in regions, ours generates sharper result, and is more robust to exposure noises in the input views during optimization thanks to the adaptive regularization.
  • Figure 4: Qualitative results on ScanNet++. We compare our pose-free methods against Depth Anything v3 (DA3) lin2025depth after optimization on ScanNet++ with four input views. Our method is able to reduce the artifacts during optimization and provides sharper renderings.
  • Figure 5: Visualization of learned uncertainty weights. We visualize the regularization weights applied to the mean of the per-pixel Gaussians ($\Lambda_\text{mean}$). Notably, the network intrinsically learns to predict higher anchoring weights for regions lacking multi-view constraints (, the left third of Input 1, which is not visible in Input 2) without explicit supervision. This helps to prevent overfitting during the test-time optimization process.
  • ...and 1 more figures