Generalizing Cooperative Eco-driving via Multi-residual Task Learning

Vindula Jayawardana; Sirui Li; Cathy Wu; Yashar Farid; Kentaro Oguchi

Generalizing Cooperative Eco-driving via Multi-residual Task Learning

Vindula Jayawardana, Sirui Li, Cathy Wu, Yashar Farid, Kentaro Oguchi

TL;DR

This work tackles algorithmic generalization of DRL for contextual, multi-agent control by introducing Multi-residual Task Learning (MRTL), which augments a nominal, model-based policy with a learned residual to operate across diverse traffic scenarios. By applying MRTL to cooperative eco-driving at signalized intersections, the authors demonstrate improved emission reductions and throughput across a large-scale set of contexts (600 intersections, 1200 scenarios) and AV penetration levels, outperforming baselines and showing robustness to noise. The key idea is to decompose the control objective into a known, tractable component handled by the nominal policy and a residual component learned by DRL, with the final policy given by $\pi(s,c) = \pi_n(s,c) + f_\theta(s,c)$. The approach offers practical benefits for fleet-level emissions management and demonstrates how leveraging existing model-based strategies can significantly aid DRL generalization in complex, real-world traffic settings.

Abstract

Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control policies that generalize to multiple traffic scenarios is still a challenge. To address this, we introduce Multi-residual Task Learning (MRTL), a generic learning framework based on multi-task learning that, for a set of task scenarios, decomposes the control into nominal components that are effectively solved by conventional control methods and residual terms which are solved using learning. We employ MRTL for fleet-level emission reduction in mixed traffic using autonomous vehicles as a means of system control. By analyzing the performance of MRTL across nearly 600 signalized intersections and 1200 traffic scenarios, we demonstrate that it emerges as a promising approach to synergize the strengths of DRL and conventional methods in generalizable control.

Generalizing Cooperative Eco-driving via Multi-residual Task Learning

TL;DR

. The approach offers practical benefits for fleet-level emissions management and demonstrates how leveraging existing model-based strategies can significantly aid DRL generalization in complex, real-world traffic settings.

Abstract

Paper Structure (20 sections, 5 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 5 equations, 5 figures, 1 table, 1 algorithm.

INTRODUCTION
Related Work
Preliminaries
Reinforcement Learning
Multi-task Reinforcement Learning
Method
Problem Formulation
Cooperative Eco-driving cMDP
Multi-residual Task Learning
MRTL for Cooperative Eco-driving
Nominal Policy
What makes the nominal policy suboptimal?
MRTL Implementation Details
Experimental Results
Baselines
...and 5 more sections

Figures (5)

Figure 1: In a signalized intersection, AVs lead platoons of human-driven vehicles. As Lagrangian actuators, they reduce fleet emissions by controlling their own acceleration and shepherding the human drivers through car following dynamics.
Figure 2: Multi-task learning trains a unified policy directly with environments (intersections) sampled from a distribution of environments (top figure). Multi-residual task learning building on multi-task learning decomposes the cMDP into parts solved by a nominal policy and residual parts solved by DRL, as shown in the bottom figure.
Figure 3: Visualization of t-SNE plots illustrating emission benefits (higher the better) in assessing the efficacy of MRTL policy in mitigating nominal policy limitations. t-SNE is used for dimensionality reduction of vectors describing incoming approaches to a two dimensional space (latent dimension 1 and 2 in the figures). Thus, each data point is an incoming approach, and the color denotes the emission benefits (a) with partial guided AV penetration (20%), (b) in the presence of protected left turns, and (c) when dealing with unprotected left turns. In all cases, the MRTL policy outperforms the nominal policy, evidenced by the predominance of blue data points in the lower-row figures as compared to the upper-row figures.
Figure 4: Effect of control noise (left) and bias noise (right) on emissions.
Figure 5: Schematic interpretation of MRTL in policy search. Left: MRTL enables better policy search initialization compared to initializing from scratch. Right: A concrete example from eco-driving at signalized intersections.

Generalizing Cooperative Eco-driving via Multi-residual Task Learning

TL;DR

Abstract

Generalizing Cooperative Eco-driving via Multi-residual Task Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)