Learning from the Best: Active Learning for Wireless Communications
Nasim Soltani, Jifan Zhang, Batool Salehi, Debashri Roy, Robert Nowak, Kaushik Chowdhury
TL;DR
The paper addresses the high labeling cost of over-the-air RF datasets for deep learning in wireless PHY tasks by introducing active learning as a label-efficient approach. It provides a two-dimensional taxonomy of deep AL for PHY and demonstrates a mmWave beam selection case study using the GALAXY algorithm to tackle extreme class imbalance on a multi-modal FLASH dataset. Empirical results show that active learning can achieve the same test accuracy with up to 50% fewer labels, highlighting practical gains in labeling efficiency for real-world RF data. The work also sketches promising future directions, including digital twins, quantum communications, privacy-aware O-RAN settings, and training-cost optimization through continual learning.
Abstract
Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machine learning that aims to reduce the labeling overhead without accuracy degradation. Active learning algorithms identify the most critical and informative samples in an unlabeled dataset and label only those samples, instead of the complete set. In this paper, we introduce active learning for deep learning applications in wireless communications, and present its different categories. We present a case study of deep learning-based mmWave beam selection, where labeling is performed by a compute-intensive algorithm based on exhaustive search. We evaluate the performance of different active learning algorithms on a publicly available multi-modal dataset with different modalities including image and LiDAR. Our results show that using an active learning algorithm for class-imbalanced datasets can reduce labeling overhead by up to 50% for this dataset while maintaining the same accuracy as classical training.
