Table of Contents
Fetching ...

Learning from the Best: Active Learning for Wireless Communications

Nasim Soltani, Jifan Zhang, Batool Salehi, Debashri Roy, Robert Nowak, Kaushik Chowdhury

TL;DR

The paper addresses the high labeling cost of over-the-air RF datasets for deep learning in wireless PHY tasks by introducing active learning as a label-efficient approach. It provides a two-dimensional taxonomy of deep AL for PHY and demonstrates a mmWave beam selection case study using the GALAXY algorithm to tackle extreme class imbalance on a multi-modal FLASH dataset. Empirical results show that active learning can achieve the same test accuracy with up to 50% fewer labels, highlighting practical gains in labeling efficiency for real-world RF data. The work also sketches promising future directions, including digital twins, quantum communications, privacy-aware O-RAN settings, and training-cost optimization through continual learning.

Abstract

Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machine learning that aims to reduce the labeling overhead without accuracy degradation. Active learning algorithms identify the most critical and informative samples in an unlabeled dataset and label only those samples, instead of the complete set. In this paper, we introduce active learning for deep learning applications in wireless communications, and present its different categories. We present a case study of deep learning-based mmWave beam selection, where labeling is performed by a compute-intensive algorithm based on exhaustive search. We evaluate the performance of different active learning algorithms on a publicly available multi-modal dataset with different modalities including image and LiDAR. Our results show that using an active learning algorithm for class-imbalanced datasets can reduce labeling overhead by up to 50% for this dataset while maintaining the same accuracy as classical training.

Learning from the Best: Active Learning for Wireless Communications

TL;DR

The paper addresses the high labeling cost of over-the-air RF datasets for deep learning in wireless PHY tasks by introducing active learning as a label-efficient approach. It provides a two-dimensional taxonomy of deep AL for PHY and demonstrates a mmWave beam selection case study using the GALAXY algorithm to tackle extreme class imbalance on a multi-modal FLASH dataset. Empirical results show that active learning can achieve the same test accuracy with up to 50% fewer labels, highlighting practical gains in labeling efficiency for real-world RF data. The work also sketches promising future directions, including digital twins, quantum communications, privacy-aware O-RAN settings, and training-cost optimization through continual learning.

Abstract

Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machine learning that aims to reduce the labeling overhead without accuracy degradation. Active learning algorithms identify the most critical and informative samples in an unlabeled dataset and label only those samples, instead of the complete set. In this paper, we introduce active learning for deep learning applications in wireless communications, and present its different categories. We present a case study of deep learning-based mmWave beam selection, where labeling is performed by a compute-intensive algorithm based on exhaustive search. We evaluate the performance of different active learning algorithms on a publicly available multi-modal dataset with different modalities including image and LiDAR. Our results show that using an active learning algorithm for class-imbalanced datasets can reduce labeling overhead by up to 50% for this dataset while maintaining the same accuracy as classical training.
Paper Structure (21 sections, 6 figures)

This paper contains 21 sections, 6 figures.

Figures (6)

  • Figure 1: Top: Classical learning where the complete dataset is labeled and used for training. Bottom: Active learning where a subset of the dataset is selected and labeled for training.
  • Figure 2: Categorization of active learning algorithms from two different and parallel perspectives of: A. Availability of RF dataset, and B. PHY problem type. In this paper, we study a PHY use case for active learning that falls within the categories marked with red boxes.
  • Figure 3: Population per class across 30711 samples in the dataset shows extreme class-imbalance in the dataset. The smallest class is class 8 with 20 and the largest class is class 18 with 6882 members.
  • Figure 4: GALAXY algorithm where (1) Uncertainty scores are calculated and the graphs are composed with sorted uncertainty scores for each class X versus all other classes as class Y, (2) Bisectable segments are identified, and (3) Bisectable segments are prioritized and the samples around all identified cuts in the bisectable segments are queried based on the priority.
  • Figure 5: Average test set accuracy measured in each active learning iteration for two modalities of image and LiDAR, each with three different algorithms of random sampling, confidence sampling, and GALAXY.
  • ...and 1 more figures