Table of Contents
Fetching ...

Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

Yiming Wang, Shaofei Wang, Marko Mihajlovic, Siyu Tang

TL;DR

This work addresses the limited expressiveness of 3D Gaussian Splatting (3DGS) by introducing Neural Texture Splatting (NTS), which attaches a global neural field to predict per-primitive RGBA texture fields for each splat. The global tri-plane representation and lightweight decoders enable view- and time-dependent local textures while maintaining efficiency through shared information and CP decomposition. Across dense-view and sparse-view tasks, including static and dynamic scenes, NTS yields state-of-the-art improvements in novel view synthesis, geometry, and dynamic reconstruction, while reducing per-primitive overfitting and improving generalization. Although NTS incurs additional computation from ray–Gaussian intersections and neural decoding, it remains a plug-and-play enhancement to existing 3DGS pipelines with clear practical impact for high-fidelity 3D scene reconstruction and rendering.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a leading approach for high-quality novel view synthesis, with numerous variants extending its applicability to a broad spectrum of 3D and 4D scene reconstruction tasks. Despite its success, the representational capacity of 3DGS remains limited by the use of 3D Gaussian kernels to model local variations. Recent works have proposed to augment 3DGS with additional per-primitive capacity, such as per-splat textures, to enhance its expressiveness. However, these per-splat texture approaches primarily target dense novel view synthesis with a reduced number of Gaussian primitives, and their effectiveness tends to diminish when applied to more general reconstruction scenarios. In this paper, we aim to achieve concrete performance improvement over state-of-the-art 3DGS variants across a wide range of reconstruction tasks, including novel view synthesis, geometry and dynamic reconstruction, under both sparse and dense input settings. To this end, we introduce Neural Texture Splatting (NTS). At the core of our approach is a global neural field (represented as a hybrid of a tri-plane and a neural decoder) that predicts local appearance and geometric fields for each primitive. By leveraging this shared global representation that models local texture fields across primitives, we significantly reduce model size and facilitate efficient global information exchange, demonstrating strong generalization across tasks. Furthermore, our neural modeling of local texture fields introduces expressive view- and time-dependent effects, a critical aspect that existing methods fail to account for. Extensive experiments show that Neural Texture Splatting consistently improves models and achieves state-of-the-art results across multiple benchmarks.

Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

TL;DR

This work addresses the limited expressiveness of 3D Gaussian Splatting (3DGS) by introducing Neural Texture Splatting (NTS), which attaches a global neural field to predict per-primitive RGBA texture fields for each splat. The global tri-plane representation and lightweight decoders enable view- and time-dependent local textures while maintaining efficiency through shared information and CP decomposition. Across dense-view and sparse-view tasks, including static and dynamic scenes, NTS yields state-of-the-art improvements in novel view synthesis, geometry, and dynamic reconstruction, while reducing per-primitive overfitting and improving generalization. Although NTS incurs additional computation from ray–Gaussian intersections and neural decoding, it remains a plug-and-play enhancement to existing 3DGS pipelines with clear practical impact for high-fidelity 3D scene reconstruction and rendering.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a leading approach for high-quality novel view synthesis, with numerous variants extending its applicability to a broad spectrum of 3D and 4D scene reconstruction tasks. Despite its success, the representational capacity of 3DGS remains limited by the use of 3D Gaussian kernels to model local variations. Recent works have proposed to augment 3DGS with additional per-primitive capacity, such as per-splat textures, to enhance its expressiveness. However, these per-splat texture approaches primarily target dense novel view synthesis with a reduced number of Gaussian primitives, and their effectiveness tends to diminish when applied to more general reconstruction scenarios. In this paper, we aim to achieve concrete performance improvement over state-of-the-art 3DGS variants across a wide range of reconstruction tasks, including novel view synthesis, geometry and dynamic reconstruction, under both sparse and dense input settings. To this end, we introduce Neural Texture Splatting (NTS). At the core of our approach is a global neural field (represented as a hybrid of a tri-plane and a neural decoder) that predicts local appearance and geometric fields for each primitive. By leveraging this shared global representation that models local texture fields across primitives, we significantly reduce model size and facilitate efficient global information exchange, demonstrating strong generalization across tasks. Furthermore, our neural modeling of local texture fields introduces expressive view- and time-dependent effects, a critical aspect that existing methods fail to account for. Extensive experiments show that Neural Texture Splatting consistently improves models and achieves state-of-the-art results across multiple benchmarks.

Paper Structure

This paper contains 40 sections, 11 equations, 9 figures, 21 tables.

Figures (9)

  • Figure 1: Method Overview. Our method enhance 3D Gaussian Splatting (3DGS) by introducing a local RGBA tri-plane texture to each splat (top right). During rendering with our proposed textured Gaussian Splatting (bottom, Sec. \ref{['subsec:textured_gaussian_splatting']}), each ray computes the intersection point with splats and queries the corresponding RGBA textures, which are combined with the original Gaussian attributes to produce the final rendering result via volume rendering (Eq.\ref{['eqn:textured_vol_rendering']}). We show checkerboard and learned RGB texture renderings for visualization. While this local texture field improves the representational capacity of color and opacity, it also increases the risk of overfitting and lacks the ability to capture view- and time-dependent variations. To address these limitations, we introduce a global tri-plane neural field that models the local texture fields in a compact and shared manner (top, Sec. \ref{['subsec:neural_global_encoding']}).
  • Figure 2: Qualitative comparison of novel view synthesis on the MipNeRF360 dataset. Our method enhances baselines by better handling view-dependent effects and preserving fine-grained structures.
  • Figure 3: Average PSNR and SSIM curves with varying input views on Blender under the sparse-view setup. Our method consistently improves upon SplatFields across input views ranging from 4 to 10.
  • Figure 4: Average PSNR and SSIM curves with varying input views on Owlii under the sparse-view dynamic setup. Our method consistently improves upon SplatFields across input views ranging from 4 to 10.
  • Figure 5: Qualitative results for sparse-view dynamic reconstruction on Owlii using 4 and 6 input views. Our method effectively reduces floaters and inaccurate boundaries compared to SplatFields.
  • ...and 4 more figures