CommGPT: A Graph and Retrieval-Augmented Multimodal Communication Foundation Model
Feibo Jiang, Wanyun Zhu, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Octavia A. Dobre
TL;DR
Frames 6G as a domain where LLMs must handle domain-specific knowledge, multimodal data, and multi-scale retrieval. Proposes CommGPT, a multimodal foundation model trained on a dedicated CommData dataset and augmented with a Graph and Retrieval-Augmented Generation (GRG) framework that fuses a knowledge graph with a vector-based RAG system. Demonstrates that CommGPT, especially the GRG variant, achieves superior performance on 3GPP-based telecom QA tasks and outperforms several open- and closed-source baselines. The work advances telecom-specific AI by enabling accurate, multimodal reasoning and knowledge integration without frequent large-scale model retraining, supporting scalable deployment in 6G networks.
Abstract
Large Language Models (LLMs) possess human-level cognitive and decision-making capabilities, making them a key technology for 6G. However, applying LLMs to the communication domain faces three major challenges: 1) Inadequate communication data; 2) Restricted input modalities; and 3) Difficulty in knowledge retrieval. To overcome these issues, we propose CommGPT, a multimodal foundation model designed specifically for communications. First, we create high-quality pretraining and fine-tuning datasets tailored in communication, enabling the LLM to engage in further pretraining and fine-tuning with communication concepts and knowledge. Then, we design a multimodal encoder to understand and process information from various input modalities. Next, we construct a Graph and Retrieval-Augmented Generation (GRG) framework, efficiently coupling Knowledge Graph (KG) with Retrieval-Augmented Generation (RAG) for multi-scale learning. Finally, we demonstrate the feasibility and effectiveness of the CommGPT through experimental validation.
