Table of Contents
Fetching ...

Federated Reinforcement Learning to Optimize Teleoperated Driving Networks

Filippo Bragato, Marco Giordani, Michele Zorzi

TL;DR

This work tackles PQoS for teleoperated driving by evaluating multiple RL approaches under a federated learning setup to optimize LiDAR compression and meet strict latency targets. It introduces a ns-3 based PQoS framework (RAN-AI) that couples scenarios, channels, networking, and Draco-based compression, enabling fair, privacy-preserving distributed training. The study finds stateful, off-policy methods, especially Q-Learning, offer the best average performance with reasonable convergence and compute costs, while neural-network-based variants incur higher complexity with marginal gains. The results demonstrate the practical viability of FL-enabled PQoS in TD networks and point to future multi-layer PQoS that jointly optimize application and RAN behavior to reduce learning outliers and improve robustness.

Abstract

Several sixth generation (6G) use cases have tight requirements in terms of reliability and latency, in particular teleoperated driving (TD). To address those requirements, Predictive Quality of Service (PQoS), possibly combined with reinforcement learning (RL), has emerged as a valid approach to dynamically adapt the configuration of the TD application (e.g., the level of compression of automotive data) to the experienced network conditions. In this work, we explore different classes of RL algorithms for PQoS, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with Neural Network (NN) approximation). We trained the agents in a federated learning (FL) setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between Quality of Service (QoS), measured in terms of the end-to-end latency, and Quality of Experience (QoE), measured in terms of the quality of the resulting compression operation. We show that Q-Learning uses a small number of learnable parameters, and is the best approach to perform PQoS in the TD scenario in terms of average reward, convergence, and computational cost.

Federated Reinforcement Learning to Optimize Teleoperated Driving Networks

TL;DR

This work tackles PQoS for teleoperated driving by evaluating multiple RL approaches under a federated learning setup to optimize LiDAR compression and meet strict latency targets. It introduces a ns-3 based PQoS framework (RAN-AI) that couples scenarios, channels, networking, and Draco-based compression, enabling fair, privacy-preserving distributed training. The study finds stateful, off-policy methods, especially Q-Learning, offer the best average performance with reasonable convergence and compute costs, while neural-network-based variants incur higher complexity with marginal gains. The results demonstrate the practical viability of FL-enabled PQoS in TD networks and point to future multi-layer PQoS that jointly optimize application and RAN behavior to reduce learning outliers and improve robustness.

Abstract

Several sixth generation (6G) use cases have tight requirements in terms of reliability and latency, in particular teleoperated driving (TD). To address those requirements, Predictive Quality of Service (PQoS), possibly combined with reinforcement learning (RL), has emerged as a valid approach to dynamically adapt the configuration of the TD application (e.g., the level of compression of automotive data) to the experienced network conditions. In this work, we explore different classes of RL algorithms for PQoS, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with Neural Network (NN) approximation). We trained the agents in a federated learning (FL) setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between Quality of Service (QoS), measured in terms of the end-to-end latency, and Quality of Experience (QoE), measured in terms of the quality of the resulting compression operation. We show that Q-Learning uses a small number of learnable parameters, and is the best approach to perform PQoS in the TD scenario in terms of average reward, convergence, and computational cost.
Paper Structure (21 sections, 8 equations, 5 figures, 2 tables)

This paper contains 21 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Average reward at the end of the training. We set $N=5$.
  • Figure 2: Average reward over the first 100 steps of the first episode (top) and at the end of the training (bottom), for $N=5$.
  • Figure 3: Average application delay with $N=10$. Plain (striped) bars are for the RL schemes (constant benchmarks).
  • Figure 4: Average mAP with $N=10$. Plain (striped) bars are for the RL schemes (constant benchmarks).
  • Figure 5: Average reward after training. Plain (striped) bars are for the RL schemes (constant benchmarks). We set $N=10$.