Table of Contents
Fetching ...

RF-GPT: Teaching AI to See the Wireless World

Hang Zou, Yu Tian, Bohao Wang, Lina Bariah, Samson Lasaulce, Chongwen Huang, Mérouane Debbah

TL;DR

RF-GPT is introduced, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms and achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.

Abstract

Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.

RF-GPT: Teaching AI to See the Wireless World

TL;DR

RF-GPT is introduced, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms and achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.

Abstract

Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.
Paper Structure (20 sections, 31 equations, 14 figures)

This paper contains 20 sections, 31 equations, 14 figures.

Figures (14)

  • Figure 1: Basic structure of RF-GPT, comprising a visio-based radio-frequency (RF) encoder (implemented by a vision encoder on RF spectrograms), an RF adapter (linear projection) that projects RF embeddings to the LLM dimension, and a decoder-only LLM. STFT stands for short-time Fourier transform.
  • Figure 2: Wireless technologies gallery: time-frequency spectrograms from six important RF signals including 5G NR, 4G LTE, 3G UMTS, WLAN, DVB-S2 and Bluetooth from (top left to bottom right).
  • Figure 3: Our fine-grained caption strategy for RF spectrograms. Metadata-derived information is grouped into five levels (summary, global visual, global context, signal visual, signal context), which are selectively combined when generating captions and instructions.
  • Figure 4: Instruction synthesis pipeline. A deterministic captioner converts metadata into a dense caption, which is then fed to a text-only LLM to generate multiple RF-grounded instruction–answer pairs.
  • Figure 5: Comparison between general-purpose VLM and RF-GPT on a 5G DL spectrogram with only one UE. General-purpose VLM such as Qwen2.5-VL-7B has no RF prior and fails to extract any useful information from the spectrogram.
  • ...and 9 more figures