Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method
Ruichen Zhang, Changyuan Zhao, Hongyang Du, Dusit Niyato, Jiacheng Wang, Suttinee Sawadsitang, Xuemin Shen, Dong In Kim
TL;DR
This work tackles the challenge of optimizing both data transmission and decision-making in embodied AI vehicular networks under bandwidth constraints. It couples LLAVA-based semantic extraction to compress multimodal sensor data into actionable text and a GAE-PPO–driven reinforcement learning framework to adapt transmission policies in real time, guided by a Weber-Fechner QoE metric. The key contributions include formulating a QoE-aware optimization problem, designing an LLAVA-based semantic pipeline with attention-grounded extraction, and implementing a stable GAE-PPO solver with a detailed MDP for V2I/V2V resource management. Empirical results show up to 36% QoE gains over DDPG, 47% faster convergence than pure PPO, and a 61.4% QoE improvement when scaling from 4 to 8 vehicles, validating the approach's effectiveness and scalability for future 6G IoV deployments.
Abstract
This paper investigates adaptive transmission strategies in embodied AI-enhanced vehicular networks by integrating large language models (LLMs) for semantic information extraction and deep reinforcement learning (DRL) for decision-making. The proposed framework aims to optimize both data transmission efficiency and decision accuracy by formulating an optimization problem that incorporates the Weber-Fechner law, serving as a metric for balancing bandwidth utilization and quality of experience (QoE). Specifically, we employ the large language and vision assistant (LLAVA) model to extract critical semantic information from raw image data captured by embodied AI agents (i.e., vehicles), reducing transmission data size by approximately more than 90\% while retaining essential content for vehicular communication and decision-making. In the dynamic vehicular environment, we employ a generalized advantage estimation-based proximal policy optimization (GAE-PPO) method to stabilize decision-making under uncertainty. Simulation results show that attention maps from LLAVA highlight the model's focus on relevant image regions, enhancing semantic representation accuracy. Additionally, our proposed transmission strategy improves QoE by up to 36\% compared to DDPG and accelerates convergence by reducing required steps by up to 47\% compared to pure PPO. Further analysis indicates that adapting semantic symbol length provides an effective trade-off between transmission quality and bandwidth, achieving up to a 61.4\% improvement in QoE when scaling from 4 to 8 vehicles.
