Tactical Decision Making for Autonomous Trucks by Deep Reinforcement Learning with Total Cost of Operation Based Reward
Deepthi Pathare, Leo Laine, Morteza Haghir Chehreghani
TL;DR
This paper addresses tactical decision making for autonomous trucks by separating high-level decisions (e.g., time gap and lane choice) from low-level control (IDM-based longitudinal control and LC2013 lane changes) and optimizing a Total Cost of Operation (TCOP) based reward. The authors compare a baseline RL-only architecture with a new architecture that delegates low-level control to physics-based controllers and evaluate three DRL algorithms (DQN, A2C, PPO) in a SUMO highway environment, supplemented by curriculum learning. Key findings show that the new architecture dramatically reduces collisions and improves target completion, while TCOP-based rewards—especially when weighted and normalized—improve safety, energy efficiency, and cost efficiency, with curriculum learning offering mixed benefits. The work provides a scalable, cost-aware framework for autonomous truck control, demonstrates open-source SUMO-based tooling, and points to transfer learning as a promising future direction for broader traffic scenarios.
Abstract
We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway scenario. Our results demonstrate that it is beneficial to separate high-level decision-making processes and low-level control actions between the reinforcement learning agent and the low-level controllers based on physical models. In the following, we study optimizing the performance with a realistic and multi-objective reward function based on Total Cost of Operation (TCOP) of the truck using different approaches; by adding weights to reward components, by normalizing the reward components and by using curriculum learning techniques.
