Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo
TL;DR
<3-5 sentence high-level summary> Splatonic tackles the real-time constraint of 3D Gaussian splatting SLAM on mobile devices by introducing adaptive sparse pixel sampling and a pixel-based rendering pipeline, paired with a co-designed hardware accelerator. The approach reduces rasterization workload dramatically and shifts certain computations (like α-checking) to projection, enabling Gaussian-level parallelism and reduced warp divergence. The authors demonstrate up to hundreds of times speedups and orders-of-magnitude energy savings across multiple 3DGS-SLAM algorithms, both in software on mobile GPUs and in dedicated hardware, with accuracy maintained or improved. This work shows a viable path to deploying high-fidelity 3DGS-SLAM in resource-constrained platforms and highlights the potential of pixel-based sparse processing for neural rendering tasks.
Abstract
3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile platforms due to their high computational cost, especially for their tracking process. This work introduces Splatonic, a sparse and efficient real-time 3DGS-SLAM algorithm-hardware co-design for resource-constrained devices. Inspired by classical SLAMs, we propose an adaptive sparse pixel sampling algorithm that reduces the number of rendered pixels by up to 256$\times$ while retaining accuracy. To unlock this performance potential on mobile GPUs, we design a novel pixel-based rendering pipeline that improves hardware utilization via Gaussian-parallel rendering and preemptive $α$-checking. Together, these optimizations yield up to 121.7$\times$ speedup on the bottleneck stages and 14.6$\times$ end-to-end speedup on off-the-shelf GPUs. To further address new bottlenecks introduced by our rendering pipeline, we propose a pipelined architecture that simplifies the overall design while addressing newly emerged bottlenecks in projection and aggregation. Evaluated across four 3DGS-SLAM algorithms, Splatonic achieves up to 274.9$\times$ speedup and 4738.5$\times$ energy savings over mobile GPUs and up to 25.2$\times$ speedup and 241.1$\times$ energy savings over state-of-the-art accelerators, all with comparable accuracy.
