Retrieval Augmented Generation with Multi-Modal LLM Framework for Wireless Environments
Muhammad Ahmed Mohsin, Ahsan Bilal, Sagnik Bhattacharya, John M. Cioffi
TL;DR
This work integrates retrieval-augmented generation with multimodal sensing to optimize wireless environments. By converting sensor data (images, GPS, LiDAR) into a unified textual and vector-based knowledge base via GPT-4o, YOLO, and ChromaDB, it enables LLM-driven decisions with real-time latency. Evaluations on GPT-4 and Gemini show improvements in relevancy, faithfulness, completeness, similarity, and accuracy by 8%, 8%, 10%, 7%, and 12%, respectively, while maintaining efficiency suitable for ISAC-like setups. The framework demonstrates practical potential for context-aware wireless optimization in complex 6G scenarios, with future work aimed at broader ISAC integration and real-time deployment.
Abstract
Future wireless networks aim to deliver high data rates and lower power consumption while ensuring seamless connectivity, necessitating robust optimization. Large language models (LLMs) have been deployed for generalized optimization scenarios. To take advantage of generative AI (GAI) models, we propose retrieval augmented generation (RAG) for multi-sensor wireless environment perception. Utilizing domain-specific prompt engineering, we apply RAG to efficiently harness multimodal data inputs from sensors in a wireless environment. Key pre-processing pipelines including image-to-text conversion, object detection, and distance calculations for multimodal RAG input from multi-sensor data are proposed to obtain a unified vector database crucial for optimizing LLMs in global wireless tasks. Our evaluation, conducted with OpenAI's GPT and Google's Gemini models, demonstrates an 8%, 8%, 10%, 7%, and 12% improvement in relevancy, faithfulness, completeness, similarity, and accuracy, respectively, compared to conventional LLM-based designs. Furthermore, our RAG-based LLM framework with vectorized databases is computationally efficient, providing real-time convergence under latency constraints.
