Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

Zijiang Yan; Ramsundar Tanikella; Hina Tabassum

Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

Zijiang Yan, Ramsundar Tanikella, Hina Tabassum

TL;DR

The paper tackles the problem of jointly optimizing autonomous-vehicle kinematics and network decisions in vehicular networks to ensure safety and reliable connectivity under stochastic channels. It proposes a Variational Quantum Circuit (VQC) based Multi-Objective Reinforcement Learning (MORL) framework, formulating the task as a MOMDP and using a quantum Q-function approximator $Q(s,a;\theta)=\langle O_a\rangle_{s,\theta}$ with a 5-qubit, 3-layer circuit. Training relies on a Bellman-based loss $\mathcal{L}(\theta)$, experience replay, and an $\epsilon$-greedy policy, achieving faster convergence and higher rewards than DDQN. Results demonstrate substantial gains in convergence speed ($31.32\%$) and rewards ($\approx 18.64\%$), highlighting the potential of quantum agents for real-time, multi-objective decision-making in dynamic VNets integrating RF and THz base stations. This work provides a scalable, quantum-enhanced approach to jointly optimize handover decisions, data rates, and vehicle trajectories in complex wireless-vehicular environments.

Abstract

In vehicular networks (VNets), ensuring both road safety and dependable network connectivity is of utmost importance. Achieving this necessitates the creation of resilient and efficient decision-making policies that prioritize multiple objectives. In this paper, we develop a Variational Quantum Circuit (VQC)-based multi-objective reinforcement learning (MORL) framework to characterize efficient network selection and autonomous driving policies in a vehicular network (VNet). Numerical results showcase notable enhancements in both convergence rates and rewards when compared to conventional deep-Q networks (DQNs), validating the efficacy of the VQC-MORL solution.

Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

TL;DR

with a 5-qubit, 3-layer circuit. Training relies on a Bellman-based loss

, experience replay, and an

-greedy policy, achieving faster convergence and higher rewards than DDQN. Results demonstrate substantial gains in convergence speed (

) and rewards (

), highlighting the potential of quantum agents for real-time, multi-objective decision-making in dynamic VNets integrating RF and THz base stations. This work provides a scalable, quantum-enhanced approach to jointly optimize handover decisions, data rates, and vehicle trajectories in complex wireless-vehicular environments.

Abstract

Paper Structure (6 sections, 3 equations, 2 figures, 1 algorithm)

This paper contains 6 sections, 3 equations, 2 figures, 1 algorithm.

Introduction
System Model and Assumptions
MOMDP Formulation and VQC-MORL
MOMDP Formulation
Proposed VQC-MORL Solution
Numerical Results and Discussions

Figures (2)

Figure 1: Training performances (ego vehicle): (a) Total telecommunication reward (b) Total transport reward (c) Collision Rate
Figure 2: Testing performance (ego vehicle): (a) Total telecommunication reward (b) Total transport reward (c) Total reward. The considered VQC architecture has 5 qubits and 3 layers.

Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

TL;DR

Abstract

Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)