Table of Contents
Fetching ...

ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments

Guile Wu, Dongfeng Bai, Bingbing Liu

TL;DR

ArmGS introduces multi-level appearance refinement for composite 3D Gaussian Splatting to model dynamic urban driving scenes. By refining Gaussians at local, global, and dynamic-actor levels with learned affine transformations and a lightweight spatial-temporal deformation head, it captures fine-grained appearance changes across frames and viewpoints while preserving differentiability. Empirical results on Waymo, KITTI, NOTR, and VKITTI2 show superior reconstruction and novel-view synthesis, as well as real-time rendering, with comprehensive ablations validating each refinement component. The approach advances efficient, photorealistic, and dynamically consistent autonomous driving scene simulation, enabling more realistic validation and testing of driving systems.

Abstract

This work focuses on modeling dynamic urban environments for autonomous driving simulation. Contemporary data-driven methods using neural radiance fields have achieved photorealistic driving scene modeling, but they suffer from low rendering efficacy. Recently, some approaches have explored 3D Gaussian splatting for modeling dynamic urban scenes, enabling high-fidelity reconstruction and real-time rendering. However, these approaches often neglect to model fine-grained variations between frames and camera viewpoints, leading to suboptimal results. In this work, we propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement for autonomous driving scene modeling. The core idea of our approach is devising a multi-level appearance modeling scheme to optimize a set of transformation parameters for composite Gaussian refinement from multiple granularities, ranging from local Gaussian level to global image level and dynamic actor level. This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained changes of background and objects. Extensive experiments on multiple challenging autonomous driving datasets, namely, Waymo, KITTI, NOTR and VKITTI2, demonstrate the superiority of our approach over the state-of-the-art methods.

ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments

TL;DR

ArmGS introduces multi-level appearance refinement for composite 3D Gaussian Splatting to model dynamic urban driving scenes. By refining Gaussians at local, global, and dynamic-actor levels with learned affine transformations and a lightweight spatial-temporal deformation head, it captures fine-grained appearance changes across frames and viewpoints while preserving differentiability. Empirical results on Waymo, KITTI, NOTR, and VKITTI2 show superior reconstruction and novel-view synthesis, as well as real-time rendering, with comprehensive ablations validating each refinement component. The approach advances efficient, photorealistic, and dynamically consistent autonomous driving scene simulation, enabling more realistic validation and testing of driving systems.

Abstract

This work focuses on modeling dynamic urban environments for autonomous driving simulation. Contemporary data-driven methods using neural radiance fields have achieved photorealistic driving scene modeling, but they suffer from low rendering efficacy. Recently, some approaches have explored 3D Gaussian splatting for modeling dynamic urban scenes, enabling high-fidelity reconstruction and real-time rendering. However, these approaches often neglect to model fine-grained variations between frames and camera viewpoints, leading to suboptimal results. In this work, we propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement for autonomous driving scene modeling. The core idea of our approach is devising a multi-level appearance modeling scheme to optimize a set of transformation parameters for composite Gaussian refinement from multiple granularities, ranging from local Gaussian level to global image level and dynamic actor level. This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained changes of background and objects. Extensive experiments on multiple challenging autonomous driving datasets, namely, Waymo, KITTI, NOTR and VKITTI2, demonstrate the superiority of our approach over the state-of-the-art methods.

Paper Structure

This paper contains 31 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An illustration of our ArmGS for modeling urban scenes. Our approach is capable of modeling fine-grained changes of background scenes and objects across frames and camera viewpoints.
  • Figure 2: An overview of our approach. Our approach refines composite driving scene Gaussians with appearance modeling at multiple granularities, ranging from local Gaussians level to global images level and dynamic actors level. The modules for local level refinement, global level refinement and actor level refinement are indicated in yellow, green and blue, respectively.
  • Figure 3: Qualitative comparison with the state-of-the-arts on Waymo. From the first to the fourth row, we show results of scene modeling under foggy, sunny, cloudy, and rainy conditions. We highlight some fine-grained details, e.g., traffic lights, vehicles and trees.
  • Figure 4: Qualitative comparison with the state-of-the-art methods on KITTI. We highlight some fine-grained details, e.g., vehicles and lane lines.
  • Figure 5: Qualitative comparison on NOTR. We highlight some fine-grained details, e.g., vehicles and pedestrians.
  • ...and 1 more figures