Federated Reinforcement Learning to Optimize Teleoperated Driving Networks
Filippo Bragato, Marco Giordani, Michele Zorzi
TL;DR
This work tackles PQoS for teleoperated driving by evaluating multiple RL approaches under a federated learning setup to optimize LiDAR compression and meet strict latency targets. It introduces a ns-3 based PQoS framework (RAN-AI) that couples scenarios, channels, networking, and Draco-based compression, enabling fair, privacy-preserving distributed training. The study finds stateful, off-policy methods, especially Q-Learning, offer the best average performance with reasonable convergence and compute costs, while neural-network-based variants incur higher complexity with marginal gains. The results demonstrate the practical viability of FL-enabled PQoS in TD networks and point to future multi-layer PQoS that jointly optimize application and RAN behavior to reduce learning outliers and improve robustness.
Abstract
Several sixth generation (6G) use cases have tight requirements in terms of reliability and latency, in particular teleoperated driving (TD). To address those requirements, Predictive Quality of Service (PQoS), possibly combined with reinforcement learning (RL), has emerged as a valid approach to dynamically adapt the configuration of the TD application (e.g., the level of compression of automotive data) to the experienced network conditions. In this work, we explore different classes of RL algorithms for PQoS, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with Neural Network (NN) approximation). We trained the agents in a federated learning (FL) setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between Quality of Service (QoS), measured in terms of the end-to-end latency, and Quality of Experience (QoE), measured in terms of the quality of the resulting compression operation. We show that Q-Learning uses a small number of learnable parameters, and is the best approach to perform PQoS in the TD scenario in terms of average reward, convergence, and computational cost.
