Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

Rex Chen; Kathleen M. Carley; Fei Fang; Norman Sadeh

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

Rex Chen, Kathleen M. Carley, Fei Fang, Norman Sadeh

TL;DR

This paper compares two popular microscopic traffic simulators, CityFlow and SUMO, to determine whether they produce distributionally equivalent low-level outputs when used to train RL-based ITSs. Through controlled virtual experiments that vary driver behavior and network scale, the authors show significant differences in instantaneous outcomes as measured by RMSE and $KL$ divergences, challenging the assumption that simulators are interchangeable for RL training. The study highlights how model choices (car-following, lane-changing), network scale, and heterogeneity influence the fidelity of simulation data, and it argues for multi-simulator validation and careful consideration of veridicality versus efficiency. The findings have practical implications for RL researchers in ITSs, suggesting that choosing or combining simulators should be guided by the intended deployment scenario and data-fidelity requirements, especially as real-world validation remains challenging but increasingly feasible with connected-vehicle data.

Abstract

Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs.

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

TL;DR

divergences, challenging the assumption that simulators are interchangeable for RL training. The study highlights how model choices (car-following, lane-changing), network scale, and heterogeneity influence the fidelity of simulation data, and it argues for multi-simulator validation and careful consideration of veridicality versus efficiency. The findings have practical implications for RL researchers in ITSs, suggesting that choosing or combining simulators should be guided by the intended deployment scenario and data-fidelity requirements, especially as real-world validation remains challenging but increasingly feasible with connected-vehicle data.

Abstract

Paper Structure (18 sections, 2 figures, 1 table)

This paper contains 18 sections, 2 figures, 1 table.

Introduction
Related Work
Validating Traffic Simulators
Modelling Driver Behavior
Traffic Simulators for RL
RL for Transportation
Comparing CityFlow and SUMO
Varying Driver Behavior
Car-Following Models
Lane-Changing Models
Experimental Setup
Experiment 1: Traffic Demand
Experiment 2: Network Scale
Experimental Results
Experiment 1: Traffic Demand
...and 3 more sections

Figures (2)

Figure 1: Screenshots in CityFlow of the arterial4x4 and grid4x4 road networks.
Figure 2: Screenshots in CityFlow of the ingolstadt1 and ingolstadt7 road networks.

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

TL;DR

Abstract

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

Authors

TL;DR

Abstract

Table of Contents

Figures (2)