LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels
Yi Hu, Huiyang Zhou
TL;DR
Gaussian splatting rendering (3DGS) enables real-time 3D rendering but relies on complex GPU kernels whose optimization is challenging across hardware. The authors explore LLM-assisted code optimization, employing a planner, profiling-guided pruning, and an evolutionary search with LLM-powered correctness checks. Results show LLMs can yield meaningful speedups on original 3DGS (up to 24% in a single query and ~42% with profiling data), while manually optimized baselines can surpass current LLM performance in some cases; for more optimized frameworks like Seele, LLMs offer modest gains (~6%) after manual fixes. The work demonstrates a practical collaboration framework between domain experts and LLMs for GPU kernel optimization in real-time rendering, while highlighting limitations in functional equivalence and the need for automatic repair in future work.
Abstract
3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the increasing complexity of GPU architectures and the vast search space of performance-tuning parameters, it is a challenging task. Although manual optimizations have achieved remarkable speedups, they require domain expertise and the optimization process can be highly time consuming and error prone. In this paper, we propose to exploit large language models (LLMs) to analyze and optimize Gaussian splatting kernels. To our knowledge, this is the first work to use LLMs to optimize highly specialized real-world GPU kernels. We reveal the intricacies of using LLMs for code optimization and analyze the code optimization techniques from the LLMs. We also propose ways to collaborate with LLMs to further leverage their capabilities. For the original 3DGS code on the MipNeRF360 datasets, LLMs achieve significant speedups, 19% with Deepseek and 24% with GPT-5, demonstrating the different capabilities of different LLMs. By feeding additional information from performance profilers, the performance improvement from LLM-optimized code is enhanced to up to 42% and 38% on average. In comparison, our best-effort manually optimized version can achieve a performance improvement up to 48% and 39% on average, showing that there are still optimizations beyond the capabilities of current LLMs. On the other hand, even upon a newly proposed 3DGS framework with algorithmic optimizations, Seele, LLMs can still further enhance its performance by 6%, showing that there are optimization opportunities missed by domain experts. This highlights the potential of collaboration between domain experts and LLMs.
