Table of Contents
Fetching ...

Federated Self-Supervised Learning for Automatic Modulation Classification under Non-IID and Class-Imbalanced Data

Usman Akram, Yiyue Chen, Haris Vikalo

TL;DR

This work tackles automatic modulation classification under privacy, non-IID distributions, and scarce labels by proposing FedSSL-AMC, a federated self-supervised framework. A causal time-dilated CNN is trained with a triplet loss on unlabeled I/Q sequences to learn a shared encoder, while each client privately trains an SVM on a small labeled set for modulation recognition. The authors provide a convergence analysis for the time-smoothed federated representation learning and a separability guarantee for the downstream classifier under encoder noise, plus a discussion of mobility-induced CFO as a heterogeneity factor. Empirical results on synthetic and MIGOU over-the-air data show FedSSL-AMC consistently outperforms supervised FL baselines across heterogeneous SNR, CFO, and non-IID label partitions, with favorable trade-offs in complexity and communication. The approach advances privacy-preserving, robust AMC by decoupling representation learning from scarce-label adaptation and demonstrates practical edge deployment potential.

Abstract

Training automatic modulation classification (AMC) models on centrally aggregated data raises privacy concerns, incurs communication overhead, and often fails to confer robustness to channel shifts. Federated learning (FL) avoids central aggregation by training on distributed clients but remains sensitive to class imbalance, non-IID client distributions, and limited labeled samples. We propose FedSSL-AMC, which trains a causal, time-dilated CNN with triplet-loss self-supervision on unlabeled I/Q sequences across clients, followed by per-client SVMs on small labeled sets. We establish convergence of the federated representation learning procedure and a separability guarantee for the downstream classifier under feature noise. Experiments on synthetic and over-the-air datasets show consistent gains over supervised FL baselines under heterogeneous SNR, carrier-frequency offsets, and non-IID label partitions.

Federated Self-Supervised Learning for Automatic Modulation Classification under Non-IID and Class-Imbalanced Data

TL;DR

This work tackles automatic modulation classification under privacy, non-IID distributions, and scarce labels by proposing FedSSL-AMC, a federated self-supervised framework. A causal time-dilated CNN is trained with a triplet loss on unlabeled I/Q sequences to learn a shared encoder, while each client privately trains an SVM on a small labeled set for modulation recognition. The authors provide a convergence analysis for the time-smoothed federated representation learning and a separability guarantee for the downstream classifier under encoder noise, plus a discussion of mobility-induced CFO as a heterogeneity factor. Empirical results on synthetic and MIGOU over-the-air data show FedSSL-AMC consistently outperforms supervised FL baselines across heterogeneous SNR, CFO, and non-IID label partitions, with favorable trade-offs in complexity and communication. The approach advances privacy-preserving, robust AMC by decoupling representation learning from scarce-label adaptation and demonstrates practical edge deployment potential.

Abstract

Training automatic modulation classification (AMC) models on centrally aggregated data raises privacy concerns, incurs communication overhead, and often fails to confer robustness to channel shifts. Federated learning (FL) avoids central aggregation by training on distributed clients but remains sensitive to class imbalance, non-IID client distributions, and limited labeled samples. We propose FedSSL-AMC, which trains a causal, time-dilated CNN with triplet-loss self-supervision on unlabeled I/Q sequences across clients, followed by per-client SVMs on small labeled sets. We establish convergence of the federated representation learning procedure and a separability guarantee for the downstream classifier under feature noise. Experiments on synthetic and over-the-air datasets show consistent gains over supervised FL baselines under heterogeneous SNR, carrier-frequency offsets, and non-IID label partitions.

Paper Structure

This paper contains 14 sections, 3 theorems, 39 equations, 6 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

The variance of the stochastic gradients $\nabla_\Theta f_c(\Theta)$ is bounded as In the high-SNR and low-regularization limit, i.e., as $\gamma \rightarrow \infty$ and $\lambda \rightarrow 0$, this bound simplifies to

Figures (6)

  • Figure 1: Time-dilated stacked convolutional layers. In layer $k$, the neurons connected to a neuron in layer $k+1$ are spaced $2^k$ apart, resulting in a receptive field that expands exponentially with network depth.
  • Figure 2: Test accuracy vs. SNR on the custom synthetic dataset across methods and label budgets (2800 vs. 14000).
  • Figure 3: Confusion matrices averaged across clients and SNR for FedSSL-AMC, SimCSE-CNN+SVM, and FedDyn-CNN when each client has 14,000 labeled examples.
  • Figure 4: Accuracy vs. SNR on the synthetic dataset under combined label and frequency offset (CFO) heterogeneity across clients. The number of labeled examples is stated in parenthesis.
  • Figure 5: Accuracy vs. SNR for the custom synthetic dataset under label and model heterogeneity (due to client-specific quantization).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • Theorem 2