Joint Adaptive OFDM and Reinforcement Learning Design for Autonomous Vehicles: Leveraging Age of Updates
Mamady Delamou, Ahmed Naeem, Huseyin Arslan, El Mehdi Amhoud
TL;DR
The paper tackles dynamic V2V mmWave ISAC by jointly adapting OFDM waveform parameters and employing reinforcement learning to exploit queue and channel state information. It introduces an AoU-based reward to balance timely updates with sensing velocity resolution, and evaluates both Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) in this setting. Key contributions include integrating adaptive modulation with frame scheduling, detailed system and sensing models, and a comprehensive RL formulation using $V_{\pi}$, $Q_{\pi}$, and $A_{\pi}$, demonstrating improved queue stability, reduced packet drops, and higher velocity estimation accuracy compared to prior work. The results highlight the practical impact of AoU-aware control in dynamic AV networks, offering a pathway to robust, high-throughput ISAC for autonomous mobility.
Abstract
Millimeter wave (mmWave)-based orthogonal frequency-division multiplexing (OFDM) stands out as a suitable alternative for high-resolution sensing and high-speed data transmission. To meet communication and sensing requirements, many works propose a static configuration where the wave's hyperparameters such as the number of symbols in a frame and the number of frames in a communication slot are already predefined. However, two facts oblige us to redefine the problem, (1) the environment is often dynamic and uncertain, and (2) mmWave is severely impacted by wireless environments. A striking example where this challenge is very prominent is autonomous vehicle (AV). Such a system leverages integrated sensing and communication (ISAC) using mmWave to manage data transmission and the dynamism of the environment. In this work, we consider an autonomous vehicle network where an AV utilizes its queue state information (QSI) and channel state information (CSI) in conjunction with reinforcement learning techniques to manage communication and sensing. This enables the AV to achieve two primary objectives: establishing a stable communication link with other AVs and accurately estimating the velocities of surrounding objects with high resolution. The communication performance is therefore evaluated based on the queue state, the effective data rate, and the discarded packets rate. In contrast, the effectiveness of the sensing is assessed using the velocity resolution. In addition, we exploit adaptive OFDM techniques for dynamic modulation, and we suggest a reward function that leverages the age of updates to handle the communication buffer and improve sensing. The system is validated using advantage actor-critic (A2C) and proximal policy optimization (PPO). Furthermore, we compare our solution with the existing design and demonstrate its superior performance by computer simulations.
