Table of Contents
Fetching ...

A Functional Learning Approach for Team-Optimal Traffic Coordination

Weihao Sun, Gehui Xu, Alessio Moreschini, Thomas Parisini, Andreas A. Malikopoulos

Abstract

In this paper, we develop a kernel-based policy iteration functional learning framework for computing team-optimal strategies in traffic coordination problems. We consider a multi-agent discrete-time linear system with a cost function that combines quadratic regulation terms and nonlinear safety penalties. Building on the Hilbert space formulation of offline receding-horizon policy iteration, we seek approximate solutions within a reproducing kernel Hilbert space, where the policy improvement step is implemented via a discrete Fréchet derivative. We further study the model-free receding-horizon scenario, where the system dynamics are estimated using recursive least squares, followed by updating the policy using rolling online data. The proposed method is tested in signal-free intersection scenarios via both model-based and model-free simulations and validated in SUMO.

A Functional Learning Approach for Team-Optimal Traffic Coordination

Abstract

In this paper, we develop a kernel-based policy iteration functional learning framework for computing team-optimal strategies in traffic coordination problems. We consider a multi-agent discrete-time linear system with a cost function that combines quadratic regulation terms and nonlinear safety penalties. Building on the Hilbert space formulation of offline receding-horizon policy iteration, we seek approximate solutions within a reproducing kernel Hilbert space, where the policy improvement step is implemented via a discrete Fréchet derivative. We further study the model-free receding-horizon scenario, where the system dynamics are estimated using recursive least squares, followed by updating the policy using rolling online data. The proposed method is tested in signal-free intersection scenarios via both model-based and model-free simulations and validated in SUMO.

Paper Structure

This paper contains 18 sections, 3 theorems, 37 equations, 7 figures, 2 algorithms.

Key Result

Theorem 1

Consider the policy iteration defined by the cost functional $\tilde{J}_t$ in eq:piimprove_cost. Let the policy update at each iteration $k$ be governed by the implicit rule eq:implicit_update with learning rate $\delta>0$. Then, for every iteration $k$ and any $\delta>0$, Therefore, the sequence $\{\widetilde{J}_0(\pi_{0:T-1}^{k})\}_{k\ge0}$ is monotonically non-increasing and converges as $k\to

Figures (7)

  • Figure 3: Offline cost (left) and vehicle positions (right).
  • Figure 4: Vehicle speeds (left) and controlled acceleration (right).
  • Figure 5: RLS identification error (left) rolling window cost (right).
  • Figure 6: CAV control inputs (left) and vehicle speeds (right).
  • Figure 7: Pairwise vehicle distances (left) and trajectories (right).
  • ...and 2 more figures

Theorems & Definitions (9)

  • Definition 1: Team-optimal Solution
  • Definition 2: Fréchet Differentiable
  • Definition 3: Fréchet Derivative
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof