Table of Contents
Fetching ...

HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles

Yifan Wang, Francesco Pittaluga, Zaid Tasneem, Chenyu You, Manmohan Chandraker, Ziyu Jiang

TL;DR

HorizonForge is introduced, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion, and establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation.

Abstract

Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation, yet existing approaches struggle to jointly achieve photorealism and precise control. We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. Edits are rendered through a noise-aware video diffusion process that enforces spatial and temporal consistency, producing diverse scene variations in a single feed-forward pass without per-trajectory optimization. To standardize evaluation, we further propose HorizonSuite, a comprehensive benchmark spanning ego- and agent-level editing tasks such as trajectory modifications and object manipulation. Extensive experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations, and that temporal priors from video diffusion are essential for coherent synthesis. Combining these findings, HorizonForge establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation, achieving an 83.4% user-preference gain and a 25.19% FID improvement over the second best state-of-the-art method. Project page: https://horizonforge.github.io/ .

HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles

TL;DR

HorizonForge is introduced, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion, and establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation.

Abstract

Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation, yet existing approaches struggle to jointly achieve photorealism and precise control. We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. Edits are rendered through a noise-aware video diffusion process that enforces spatial and temporal consistency, producing diverse scene variations in a single feed-forward pass without per-trajectory optimization. To standardize evaluation, we further propose HorizonSuite, a comprehensive benchmark spanning ego- and agent-level editing tasks such as trajectory modifications and object manipulation. Extensive experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations, and that temporal priors from video diffusion are essential for coherent synthesis. Combining these findings, HorizonForge establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation, achieving an 83.4% user-preference gain and a 25.19% FID improvement over the second best state-of-the-art method. Project page: https://horizonforge.github.io/ .
Paper Structure (30 sections, 28 equations, 8 figures, 7 tables)

This paper contains 30 sections, 28 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: HorizonForge is capable of generating high-quality driving scenes in accordance with the provided manipulation instructions. The top two rows of the image depict the transformation of the ego car to the right, while the bottom two rows illustrate the insertion of a gray sedan in front of the selected SUV at the red box location.
  • Figure 2: Overview of the HorizonForge framework. With original video and trajectory, we will firstly extract corresponding 3D assets according to the manipulated novel trajectories, then feed them into our rendering model for final generation results.
  • Figure 3: A demonstration of 3D Meshes Harvesting pipeline.
  • Figure 4: A demonstration of constructing data pairs for Gaussian Splats.
  • Figure 5: A demonstration of mesh-Gaussian training data pairs. The top frames are the original Gaussian Splats and the bottom ones are Gaussian with mesh vehicle replacements
  • ...and 3 more figures