Table of Contents
Fetching ...

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Zicheng Zhang, Xiangting Meng, Ke Wu, Wenchao Ding

Abstract

Recent progress in feed-forward 3D Gaussian Splatting (3DGS) has notably improved rendering quality. However, the spatially uniform and highly redundant 3DGS map generated by previous feed-forward 3DGS methods limits their integration into downstream reconstruction tasks. We propose SparseSplat, the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions, yielding highly compact 3DGS maps. To achieve this, we propose entropy-based probabilistic sampling, generating large, sparse Gaussians in textureless areas and assigning small, dense Gaussians to regions with rich information. Additionally, we designed a specialized point cloud network that efficiently encodes local context and decodes it into 3DGS attributes, addressing the receptive field mismatch between the general 3DGS optimization pipeline and feed-forward models. Extensive experimental results demonstrate that SparseSplat can achieve state-of-the-art rendering quality with only 22% of the Gaussians and maintain reasonable rendering quality with only 1.5% of the Gaussians. Project page: https://victkk.github.io/SparseSplat-page/.

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Abstract

Recent progress in feed-forward 3D Gaussian Splatting (3DGS) has notably improved rendering quality. However, the spatially uniform and highly redundant 3DGS map generated by previous feed-forward 3DGS methods limits their integration into downstream reconstruction tasks. We propose SparseSplat, the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions, yielding highly compact 3DGS maps. To achieve this, we propose entropy-based probabilistic sampling, generating large, sparse Gaussians in textureless areas and assigning small, dense Gaussians to regions with rich information. Additionally, we designed a specialized point cloud network that efficiently encodes local context and decodes it into 3DGS attributes, addressing the receptive field mismatch between the general 3DGS optimization pipeline and feed-forward models. Extensive experimental results demonstrate that SparseSplat can achieve state-of-the-art rendering quality with only 22% of the Gaussians and maintain reasonable rendering quality with only 1.5% of the Gaussians. Project page: https://victkk.github.io/SparseSplat-page/.

Paper Structure

This paper contains 48 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: SparseSplat achieves state-of-the-art rendering quality on DL3DV dl3dv using significantly fewer Gaussians than the previous SOTA, depthSplat depthsplat (150k vs. 688k). Our model also generates competitive results in sparse settings (e.g. 10k). As illustrated by ellipsoid renderings sibr2020, SparseSplat adaptively allocates Gaussian density based on scene content. This contrasts with previous methods that adopt a pixel-aligned strategy, which produces spatially uniform and highly redundant Gaussian primitives even in textureless regions.
  • Figure 2: Overall Pipeline of SparseSplat. Our method begins with using a frozen backbone depthsplat to generate feature maps and depth maps from multi-view posed images. Next, in the Adaptive Primitive Sampling stage, the entropy maps are calculated and transformed into probability maps to perform sampling, resulting in sparse 2D pixels. These pixels are then back-projected into 3D space using the predicted depth to form 3D Sparse Anchor Points. Finally, for each anchor point, we gather its local point cloud neighborhood via KNN. This local neighborhood is fed into a lightweight prediction head to predict its complete Gaussian attributes ($\alpha, s, q, c$). All predicted Gaussian primitives are then merged to generate the final scene.
  • Figure 3: The Locality of Classic 3DGS Optimization. In this example, three 3D Gaussian primitives are splatted onto the 2D image plane. Primitive $g_c$ covers two pixels: one covered exclusively by $g_c$, and another accumulating contributions from all three primitives. During backpropagation, gradients propagate to $g_c$ through both pixels. Notably, $g_a$ and $g_b$ modulate the gradient flow at the shared pixel by affecting the rendering process. The gradient flow to $g_c$ is detailed in \ref{['eq:gs_description']}.
  • Figure 4: Rendering quality comparisons on DL3DV. Our model matches the SOTA rendering quality of DepthSplat with only 150k Gaussians (vs. 688k). Under sparse settings (40k and 10k), our method maintains structural integrity and shows minor progressive blurring.
  • Figure 5: Additional qualitative comparisons.
  • ...and 1 more figures