Table of Contents
Fetching ...

Tactical Decision Making for Autonomous Trucks by Deep Reinforcement Learning with Total Cost of Operation Based Reward

Deepthi Pathare, Leo Laine, Morteza Haghir Chehreghani

TL;DR

This paper addresses tactical decision making for autonomous trucks by separating high-level decisions (e.g., time gap and lane choice) from low-level control (IDM-based longitudinal control and LC2013 lane changes) and optimizing a Total Cost of Operation (TCOP) based reward. The authors compare a baseline RL-only architecture with a new architecture that delegates low-level control to physics-based controllers and evaluate three DRL algorithms (DQN, A2C, PPO) in a SUMO highway environment, supplemented by curriculum learning. Key findings show that the new architecture dramatically reduces collisions and improves target completion, while TCOP-based rewards—especially when weighted and normalized—improve safety, energy efficiency, and cost efficiency, with curriculum learning offering mixed benefits. The work provides a scalable, cost-aware framework for autonomous truck control, demonstrates open-source SUMO-based tooling, and points to transfer learning as a promising future direction for broader traffic scenarios.

Abstract

We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway scenario. Our results demonstrate that it is beneficial to separate high-level decision-making processes and low-level control actions between the reinforcement learning agent and the low-level controllers based on physical models. In the following, we study optimizing the performance with a realistic and multi-objective reward function based on Total Cost of Operation (TCOP) of the truck using different approaches; by adding weights to reward components, by normalizing the reward components and by using curriculum learning techniques.

Tactical Decision Making for Autonomous Trucks by Deep Reinforcement Learning with Total Cost of Operation Based Reward

TL;DR

This paper addresses tactical decision making for autonomous trucks by separating high-level decisions (e.g., time gap and lane choice) from low-level control (IDM-based longitudinal control and LC2013 lane changes) and optimizing a Total Cost of Operation (TCOP) based reward. The authors compare a baseline RL-only architecture with a new architecture that delegates low-level control to physics-based controllers and evaluate three DRL algorithms (DQN, A2C, PPO) in a SUMO highway environment, supplemented by curriculum learning. Key findings show that the new architecture dramatically reduces collisions and improves target completion, while TCOP-based rewards—especially when weighted and normalized—improve safety, energy efficiency, and cost efficiency, with curriculum learning offering mixed benefits. The work provides a scalable, cost-aware framework for autonomous truck control, demonstrates open-source SUMO-based tooling, and points to transfer learning as a promising future direction for broader traffic scenarios.

Abstract

We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway scenario. Our results demonstrate that it is beneficial to separate high-level decision-making processes and low-level control actions between the reinforcement learning agent and the low-level controllers based on physical models. In the following, we study optimizing the performance with a realistic and multi-objective reward function based on Total Cost of Operation (TCOP) of the truck using different approaches; by adding weights to reward components, by normalizing the reward components and by using curriculum learning techniques.
Paper Structure (23 sections, 7 equations, 10 figures, 7 tables)

This paper contains 23 sections, 7 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Schematic diagram of the highway simulation environment. The truck in green color is the ego vehicle.
  • Figure 2: Overview of the baseline architecture
  • Figure 3: Overview of the new architecture
  • Figure 4: Comparison of episode rewards in baseline architecture with and without leading vehicle distance in state space.
  • Figure 5: Comparison of average episodic reward in the baseline and new architectures with different RL agents
  • ...and 5 more figures