EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene
Yixiong Huo, Guangfeng Jiang, Hongyang Wei, Ji Liu, Song Zhang, Han Liu, Xingliang Huang, Mingjie Lu, Jinzhang Peng, Dong Li, Lu Tian, Emad Barsoum
TL;DR
EGSRAL presents an enhanced 3D Gaussian Splatting renderer for large-scale driving scenes, integrating a Deformation Enhancement Module (DEM), an Opacity Enhancement Module (OEM), and a Grouping Strategy (GPS) to improve dynamic-object modeling and rendering efficiency. A novel adaptor enables automatic labeling by translating coordinates between coordinate systems and generating corresponding 2D/3D annotations for novel views, guided by a triad of losses and pose augmentation. Empirical results show state-of-the-art novel-view synthesis on KITTI and nuScenes datasets and notable improvements in downstream 2D/3D detection when using synthetic annotations, with ablations confirming the effectiveness of each component and the grouping strategy. The work demonstrates practical impact by reducing annotation dependence while delivering high-quality renderings suitable for autonomous driving perception pipelines. Overall, EGSRAL advances the integration of fast, high-fidelity 3D GS rendering with automated labeling for scalable driving-scene understanding.
Abstract
3D Gaussian Splatting (3D GS) has gained popularity due to its faster rendering speed and high-quality novel view synthesis. Some researchers have explored using 3D GS for reconstructing driving scenes. However, these methods often rely on various data types, such as depth maps, 3D boxes, and trajectories of moving objects. Additionally, the lack of annotations for synthesized images limits their direct application in downstream tasks. To address these issues, we propose EGSRAL, a 3D GS-based method that relies solely on training images without extra annotations. EGSRAL enhances 3D GS's capability to model both dynamic objects and static backgrounds and introduces a novel adaptor for auto labeling, generating corresponding annotations based on existing annotations. We also propose a grouping strategy for vanilla 3D GS to address perspective issues in rendering large-scale, complex scenes. Our method achieves state-of-the-art performance on multiple datasets without any extra annotation. For example, the PSNR metric reaches 29.04 on the nuScenes dataset. Moreover, our automated labeling can significantly improve the performance of 2D/3D detection tasks. Code is available at https://github.com/jiangxb98/EGSRAL.
