Blind Federated Learning via Over-the-Air q-QAM

Saeed Razavikia; José Mairton Barros Da Silva Júnior; Carlo Fischione

Blind Federated Learning via Over-the-Air q-QAM

Saeed Razavikia, José Mairton Barros Da Silva Júnior, Carlo Fischione

TL;DR

This paper addresses FEEL over fading multiple-access channels where edge devices lack CSI and seeks to reduce uplink latency using digital over-the-air computation with $q$-QAM.ChannelCompFed leverages ES-side receive beamforming with multiple antennas, a closed-form high-order QAM encoding/decoding scheme, and a no-CSI transmit strategy to enable accurate gradient aggregation.The authors derive non-asymptotic MSE bounds under noisy and fading conditions, establish a probabilistic antenna requirement $N_r = O(1/\sigma^2)$ for convergence, and prove convergence rates for non-convex objectives, complemented by numerical validations on MNIST and CIFAR-10 showing up to about 60% accuracy gains with more antennas and higher modulation orders.Practically, the approach offers low-latency, spectrally efficient FEEL suitable for wireless edge deployments, with concrete guidelines on system design and demonstrated performance improvements over analog FEEL and orthogonal baselines.

Abstract

In this work, we investigate federated edge learning over a fading multiple access channel. To alleviate the communication burden between the edge devices and the access point, we introduce a pioneering digital over-the-air computation strategy employing q-ary quadrature amplitude modulation, culminating in a low latency communication scheme. Indeed, we propose a new federated edge learning framework in which edge devices use digital modulation for over-the-air uplink transmission to the edge server while they have no access to the channel state information. Furthermore, we incorporate multiple antennas at the edge server to overcome the fading inherent in wireless communication. We analyze the number of antennas required to mitigate the fading impact effectively. We prove a non-asymptotic upper bound for the mean squared error for the proposed federated learning with digital over-the-air uplink transmissions under both noisy and fading conditions. Leveraging the derived upper bound, we characterize the convergence rate of the learning process of a non-convex loss function in terms of the mean square error of gradients due to the fading channel. Furthermore, we substantiate the theoretical assurances through numerical experiments concerning mean square error and the convergence efficacy of the digital federated edge learning framework. Notably, the results demonstrate that augmenting the number of antennas at the edge server and adopting higher-order modulations improve the model accuracy up to 60\%.

Blind Federated Learning via Over-the-Air q-QAM

TL;DR

Abstract

Paper Structure (27 sections, 10 theorems, 96 equations, 7 figures)

This paper contains 27 sections, 10 theorems, 96 equations, 7 figures.

Introduction
Literature Review
Our Contribution
Organization of The Paper
Notations
System Model
Learning Model
Communication Model
ChannelCompFed: Encoding and Decoding
Encoder and Decoder
Theoretical Convergence Analysis
MSE Analysis
Number of Antennas for Convergence
Convergence Analysis
Numerical Experiments
...and 12 more sections

Key Result

Proposition 1

Let $\tilde{g}_k\in \{0,1,\ldots, q-1\}$ be an integer value for $k\in [K]$ and $\tilde{g} = \sum_{k=1}^K\tilde{g}_k/K \in \{0,1/K,\ldots,q-1-1/K, q-1\}$ be the average value. Then, the decoding and encoding functions $\mathscr{D}_{q}$ and $\mathscr{E}_{q}$ are identity operators with respect to $\

Figures (7)

Figure 1: Diagram of federated edge learning where we assume to perform over-the-air computation by the digital modulation QAM. Here, dashed arrow lines (blue color) show the downlink transmission, where the ES sends the updated global model in Eq. \ref{['eq:aggreg']} back to the edge device. The red lines show the uplink phase, where device $k$ transmits the parameters of the trained model with its local data using modulated signal $\bm{s}_k$ in Eq. \ref{['eq:Sk']} to the server ES.
Figure 2: Block diagram illustrating the communication model for FEEL at the $n$-th communication subchannel. The gradients $g_{1}^n$, $g_{2}^n, \ldots, g_{K}^n$ are first quantized using operator $\mathcal{Q}_q(\cdot)$, and then go through an encoder, $\mathscr{E}_q(\cdot)$, and the pre-processing function, $\varphi$. Then, all edge devices transmit the modulated signals $s_{1}^n$, $s_{2}^n, \ldots s_{K}^n$ over the MAC resulting in the received vector $\bm{y}^n$, which is degraded by the noise $\bm{z}^n$ and the wireless channel effects $\bm{h}^n$. The received signal by $N_r$ antennas is $\bm{y}^n$, which undergoes a receiver beamforming vector, $\bm{u}^n$, at the ES. Then, the resultant signal is passed through the post-processing function $\psi$ to obtain $n$-th element of the received vector $\bm{r}$, i.e., $r^n$. Finally, $r^n$ is decoded by the decoder $\mathscr{D}_q(\cdot)$ to yield the estimated function $\hat{g}^n$.
Figure 3: Monte Carlo numerical evaluation of the summation function, for $10$ trials versus the analytical results from Theorem \ref{['th:NRerror']} for $\delta =0.01$ over different numbers of antennas, $N_r$. The channel coefficients and channel noise generated by $N(\bm{0}, \sigma_h\bm{I}_{N_r})$ and $\mathcal{CN}(\bm{0}, \sigma_z\bm{I}_{N_r})$, respectively, are $\sigma_h = \sigma_z=1$. Figure \ref{['fig:ConcentrateS(a)']} shows the empirical MSE of $\hat{s}^n$, analytical upper bound on the error in \ref{['eq:epsilonupp']}, and expected value in \ref{['eq:Expecepsilonupp']} form Theorem \ref{['th:NRerror']}, for $K=200$ edge devices. Figure \ref{['fig:ConcentrateS(b)']} shows the empirical and analytical upper bound on the MSE of gradient $\hat{\bm{g}}$ from Proposition \ref{['cor:norm2']}, for $K=20$ edge devices whose elements of their gradients, $\bm{g}_k$, generated uniformly at random from $\mathcal{U}[-2,2]$.
Figure 4: Monte Carlo numerical evaluation of the average gradient estimation in \ref{['eq:MSEGradiant']} with true gradient, for $100$ trials versus the analytical results from Proposition \ref{['Pr:MSE']}. Here, we consider $q = 64$ and $N = 100$ for two cases of $K=50$ and $K=400$ edge devices. The gradients $\bm{g}_k$ generated uniformly at random from $\mathcal{U}[0,64]$ and $\mathcal{U}[0,32]$.
Figure 5: Accuracy of the MNIST task as a function of the communication rounds for $K=20$ edge devices and heterogeneous data distribution across edge devices. Figures \ref{['fig:MNIST(a)']} and \ref{['fig:MNIST(b)']} show the accuracy of FEEL versus number of communication rounds for two low variance of the noise, i.e., $\sigma_z^2 =1$ and the high variance of the noise, i.e., $\sigma_z^2 =10$, respectively.
...and 2 more figures

Theorems & Definitions (25)

Remark 1
Remark 2
Proposition 1
proof
Remark 3
Remark 4
Proposition 2
proof
Theorem 1
proof
...and 15 more

Blind Federated Learning via Over-the-Air q-QAM

TL;DR

Abstract

Blind Federated Learning via Over-the-Air q-QAM

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (25)