Masked Symbol Modeling for Demodulation of Oversampled Baseband Communication Signals in Impulsive Noise-Dominated Channels
Oguz Bedir, Nurullah Sevim, Mostafa Ibrahim, Sabit Ekin
TL;DR
This work reframes the inter-symbol interference from pulse shaping as a deterministic source of contextual information in oversampled baseband signals. It introduces Masked Symbol Modeling (MSM), a Bert-style Transformer framework trained to predict masked symbol identities from surrounding samples, thereby learning the latent syntax of waveform structure. The approach is demonstrated in a demodulation-like task under impulsive Middleton Class-A noise, suggesting a path toward context-aware, interpretation-capable PHY receivers rather than traditional detectors. If refined, MSM could enhance demodulation accuracy and enable richer waveform-interpretation capabilities in challenging channel conditions.
Abstract
Recent breakthroughs in natural language processing show that attention mechanism in Transformer networks, trained via masked-token prediction, enables models to capture the semantic context of the tokens and internalize the grammar of language. While the application of Transformers to communication systems is a burgeoning field, the notion of context within physical waveforms remains under-explored. This paper addresses that gap by re-examining inter-symbol contribution (ISC) caused by pulse-shaping overlap. Rather than treating ISC as a nuisance, we view it as a deterministic source of contextual information embedded in oversampled complex baseband signals. We propose Masked Symbol Modeling (MSM), a framework for the physical (PHY) layer inspired by Bidirectional Encoder Representations from Transformers methodology. In MSM, a subset of symbol aligned samples is randomly masked, and a Transformer predicts the missing symbol identifiers using the surrounding "in-between" samples. Through this objective, the model learns the latent syntax of complex baseband waveforms. We illustrate MSM's potential by applying it to the task of demodulating signals corrupted by impulsive noise, where the model infers corrupted segments by leveraging the learned context. Our results suggest a path toward receivers that interpret, rather than merely detect communication signals, opening new avenues for context-aware PHY layer design.
