MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

Junyu Shen; Zhendong She; Chenghanyu Zhang; Yuchuang Sun; Luqing Luo; Dingwei Tan; Zonghao Guo; Bo Guo; Zehua Han; Wupeng Xie; Yaxin Mu; Peng Zhang; Peipei Li; Fengxiang Wang; Yangang Sun; Maosong Sun

MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

Junyu Shen, Zhendong She, Chenghanyu Zhang, Yuchuang Sun, Luqing Luo, Dingwei Tan, Zonghao Guo, Bo Guo, Zehua Han, Wupeng Xie, Yaxin Mu, Peng Zhang, Peipei Li, Fengxiang Wang, Yangang Sun, Maosong Sun

Abstract

The paradigm of Multimodal Large Language Models (MLLMs) offers a promising blueprint for advancing the electromagnetic (EM) domain. However, prevailing approaches often deviate from the native MLLM paradigm, instead using task-specific or pipelined architectures that lead to fundamental limitations in model performance and generalization. Fully realizing the MLLM potential in EM domain requires overcoming three main challenges: (1) Data. The scarcity of high-quality datasets with paired EM signals and descriptive text annotations used for MLLMs pre-training; (2) Benchmark. The absence of comprehensive benchmarks to systematically evaluate and compare the performance of models on EM signal-to-text tasks; (3) Model. A critical fragility in low Signal-to-Noise Ratio (SNR) environments, where critical signal features can be obscured, leading to significant performance degradation. To address these challenges, we introduce a tripartite contribution to establish a foundation for MLLMs in the EM domain. First, to overcome data scarcity, we construct and release EM-100k, a large-scale dataset comprising over 100,000 EM signal-text pairs. Second, to enable rigorous and standardized evaluation, we propose EM-Bench, the most comprehensive benchmark featuring diverse downstream tasks spanning from perception to reasoning. Finally, to tackle the core modeling challenge, we present MERLIN, a novel training framework designed not only to align low-level signal representations with high-level semantic text, but also to explicitly enhance model robustness and performance in challenging low-SNR environments. Comprehensive experiments validate our method, showing that MERLIN is state-of-the-art in the EM-Bench and exhibits remarkable robustness in low-SNR settings.

MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

Abstract

Paper Structure (21 sections, 6 equations, 15 figures, 5 tables)

This paper contains 21 sections, 6 equations, 15 figures, 5 tables.

Introduction
Related Work
EM-100k & EM-Bench
Evaluation Dimensions
Data Collection and QA Pairs Construction
Analysis
MERLIN Framework
Motivation.
Model Architecture.
Two-Stage Training paradigm
Experiment
Experimental Setup
Main Results
Ablation Study
Conclusion
...and 6 more sections

Figures (15)

Figure 1: The hierarchical evaluation framework of EM-Bench, which systematically assesses the perception and reasoning capabilities of MLLMs on electromagnetic IQ signals across 3 levels and 14 sub-tasks.
Figure 2: The two-stage construction pipeline for the EM-100K fine-tuning dataset and the EM-Bench evaluation benchmark. Stage 1 focuses on building a large-scale, expert-validated IQ data corpus. Stage 2 details the distinct generation processes: a direct, large-scale formatting for EM-100K, and a rigorously expert-validated pathway for the high-quality EM-Bench.
Figure 3: The architecture and training framework of MERLIN. (1) The baseline model architecture consists of a Signal Encoder, a Projector, and a LLM. (2) The knowledge distillation framework enhances low-SNR robustness by using a frozen high-SNR teacher model to guide a student model. (3) The Denoising Subspace Module (DSM) facilitates effective distillation by projecting noisy signal features into a clean, noise-invariant feature space.
Figure 4: Motivational analysis demonstrating that low-SNR degradation is a feature-collapse problem
Figure 5: Detailed performance comparison across 10 sub-tasks. The red line denotes MERLIN and the gray line denotes the Stage-1 baseline. MERLIN demonstrates significant improvements in complex parameter estimation tasks (e.g., Duty Cycle, Pulse Width) across all SNR levels, while maintaining robust performance in classification tasks.
...and 10 more figures

MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

Abstract

MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

Authors

Abstract

Table of Contents

Figures (15)