Self-supervised Learning for Clustering of Wireless Spectrum Activity
Ljupcho Milosheski, Gregor Cerar, Blaž Bertalanič, Carolina Fortuna, Mihael Mohorčič
TL;DR
The paper addresses the labeling bottleneck in real-world wireless spectrum analysis by adopting self-supervised learning to cluster spectrogram segments without labels. It adapts a DeepCluster-inspired SSL framework to spectrum data, comparing CNN and Vision Transformer feature extractors, and benchmarks against PCA and a CNN autoencoder baseline. Key findings show that SSL yields higher-quality embeddings and superior clustering, while drastically reducing embedding dimensionality by about two orders of magnitude and reducing model complexity by roughly an order of magnitude; clustering performance improves by about 2–2.5x across metrics. The results also reveal that CNN features may outperform ViT for spectrograms with low content, and provide a methodology for evaluating transmission clustering in this domain. Overall, the work demonstrates a scalable, data-efficient approach for automatic exploration and cataloguing of spectrum activities in unlabeled real-world environments, with implications for cognitive radio and spectrum management systems.
Abstract
In recent years, much work has been done on processing of wireless spectrum data involving machine learning techniques in domain-related problems for cognitive radio networks, such as anomaly detection, modulation classification, technology classification and device fingerprinting. Most of the solutions are based on labeled data, created in a controlled manner and processed with supervised learning approaches. However, spectrum data measured in real-world environment is highly nondeterministic, making its labeling a laborious and expensive process, requiring domain expertise, thus being one of the main drawbacks of using supervised learning approaches in this domain. In this paper, we investigate the use of self-supervised learning (SSL) for exploring spectrum activities in a real-world unlabeled data. In particular, we compare the performance of two SSL models, one based on a reference DeepCluster architecture and one adapted for spectrum activity identification and clustering, and a baseline model based on K-means clustering algorithm. We show that SSL models achieve superior performance regarding the quality of extracted features and clustering performance. With SSL models we achieve reduction of the feature vectors size by two orders of magnitude, while improving the performance by a factor of 2 to 2.5 across the evaluation metrics, supported by visual assessment. Additionally we show that adaptation of the reference SSL architecture to the domain data provides reduction of model complexity by one order of magnitude, while preserving or even improving the clustering performance.
