Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving

Zijiang Yan; Hao Zhou; Hina Tabassum; Xue Liu

Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving

Zijiang Yan, Hao Zhou, Hina Tabassum, Xue Liu

TL;DR

This work tackles the joint optimization of V2I communications and autonomous driving policies in a RF-THz highway environment using a hybrid LLM-DDQN framework. An LLM-based AD decision module provides actions that are incorporated into the V2I MDP solved by DDQN, and the two components are iteratively optimized until convergence. The approach leverages language-based task descriptions, distance-based demonstration retrieval, and an experience pool of good/bad past examples to guide the LLM, while a DDQN with a target network optimizes data-rate and handover trade-offs via $R_{ij} = W_j \log_2(1 + SINR_{ij})$ and $WR_{ij} = \frac{R_{ij}}{\min(Q_i,n_i)}(1-\mu)$. Simulations on a highway with parameterized vehicle and base-station deployments show that the hybrid method achieves faster convergence and higher average rewards than a conventional DDQN baseline, while reducing collision and handover rates. The results illustrate the potential of LLMs for explainable, data-driven optimization in complex, coupled cyber-physical networks.

Abstract

Large language models (LLMs) have received considerable interest recently due to their outstanding reasoning and comprehension capabilities. This work explores applying LLMs to vehicular networks, aiming to jointly optimize vehicle-to-infrastructure (V2I) communications and autonomous driving (AD) policies. We deploy LLMs for AD decision-making to maximize traffic flow and avoid collisions for road safety, and a double deep Q-learning algorithm (DDQN) is used for V2I optimization to maximize the received data rate and reduce frequent handovers. In particular, for LLM-enabled AD, we employ the Euclidean distance to identify previously explored AD experiences, and then LLMs can learn from past good and bad decisions for further improvement. Then, LLM-based AD decisions will become part of states in V2I problems, and DDQN will optimize the V2I decisions accordingly. After that, the AD and V2I decisions are iteratively optimized until convergence. Such an iterative optimization approach can better explore the interactions between LLMs and conventional reinforcement learning techniques, revealing the potential of using LLMs for network optimization and management. Finally, the simulations demonstrate that our proposed hybrid LLM-DDQN approach outperforms the conventional DDQN algorithm, showing faster convergence and higher average rewards.

Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving

TL;DR

Abstract

Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)