StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency
Yuyin Chen, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Yifei Zhan, Xianpeng Lang
TL;DR
StyledStreets presents a multi-style street simulation framework that preserves spatial and temporal consistency during appearance edits. It combines motion-adaptive Street Gaussians, a hybrid geometry–appearance embedding, and uncertainty-aware diffusion guidance to achieve photorealistic style transfer across seven vehicle-mounted cameras while maintaining geometric fidelity. The method demonstrates state-of-the-art geometric accuracy on the Waymo Open Dataset, with notable improvements in vehicle reconstruction ($+2.15$ dB PSNR) and depth accuracy (Chamfer distance reduction of $18\%$) compared to baselines, and shows robust performance under extreme weather edits. This work enables reliable urban digital twins and autonomous-vehicle testing with controllable environmental variations and stable multi-view consistency, advancing practical deployment in simulation-driven perception and planning tasks.
Abstract
Urban scene reconstruction requires modeling both static infrastructure and dynamic elements while supporting diverse environmental conditions. We present \textbf{StyledStreets}, a multi-style street simulator that achieves instruction-driven scene editing with guaranteed spatial and temporal consistency. Building on a state-of-the-art Gaussian Splatting framework for street scenarios enhanced by our proposed pose optimization and multi-view training, our method enables photorealistic style transfers across seasons, weather conditions, and camera setups through three key innovations: First, a hybrid embedding scheme disentangles persistent scene geometry from transient style attributes, allowing realistic environmental edits while preserving structural integrity. Second, uncertainty-aware rendering mitigates supervision noise from diffusion priors, enabling robust training across extreme style variations. Third, a unified parametric model prevents geometric drift through regularized updates, maintaining multi-view consistency across seven vehicle-mounted cameras. Our framework preserves the original scene's motion patterns and geometric relationships. Qualitative results demonstrate plausible transitions between diverse conditions (snow, sandstorm, night), while quantitative evaluations show state-of-the-art geometric accuracy under style transfers. The approach establishes new capabilities for urban simulation, with applications in autonomous vehicle testing and augmented reality systems requiring reliable environmental consistency. Codes will be publicly available upon publication.
