ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

Jordi Muñoz Vicente

ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

Jordi Muñoz Vicente

TL;DR

Results validate ImprovedGS+, a high-performance, low-level reinvention of the ImprovedGS strategy, implemented natively within the LichtFeld-Studio framework as a scalable, high-speed solution that upholds the core pillars of Speed, Quality, and Usability within the LichtFeld-Studio ecosystem.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have shifted the focus toward balancing reconstruction fidelity with computational efficiency. In this work, we propose ImprovedGS+, a high-performance, low-level reinvention of the ImprovedGS strategy, implemented natively within the LichtFeld-Studio framework. By transitioning from high-level Python logic to hardware-optimized C++/CUDA kernels, we achieve a significant reduction in host-device synchronization and training latency. Our implementation introduces a Long-Axis-Split (LAS) CUDA kernel, custom Laplacian-based importance kernels with Non-Maximum Suppression (NMS) for edge scores, and an adaptive Exponential Scale Scheduler. Experimental results on the Mip-NeRF360 dataset demonstrate that ImprovedGS+ establishes a new Pareto-optimal front for scene reconstruction. Our 1M-budget variant outperforms the state-of-the-art MCMC baseline by achieving a 26.8% reduction in training time (saving 17 minutes per session) and utilizing 13.3% fewer Gaussians while maintaining superior visual quality. Furthermore, our full variant demonstrates a 1.28 dB PSNR increase over the ADC baseline with a 38.4% reduction in parametric complexity. These results validate ImprovedGS+ as a scalable, high-speed solution that upholds the core pillars of Speed, Quality, and Usability within the LichtFeld-Studio ecosystem.

ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

TL;DR

Abstract

Paper Structure (18 sections, 1 figure, 5 tables, 2 algorithms)

This paper contains 18 sections, 1 figure, 5 tables, 2 algorithms.

Introduction
Methodology
Laplacian Filter Kernel Implementation
Future Work
Direct CUDA Kernel Long-Axis-Split (LAS)
Algorithm 1
Specialized Global Transformation for LAS
Refinement Setup
Positional Learning Rate Optimization
Global Warm-up Phase (Initial Score Masking)
Experiments and Results
Datasets and Metrics
Ablation Study
Impact of the Scale Scheduler
Results
...and 3 more sections

Figures (1)

Figure 1: Visual comparison of edge importance maps on the Bicycle scene. The zoomed regions highlight how our CUDA-based NMS kernel reduces the noise and focuses in a thinned structural backbone compared to the baseline.

ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

TL;DR

Abstract

ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

Authors

TL;DR

Abstract

Table of Contents

Figures (1)