Metasurfaces-Integrated Wireless Neural Networks for Lightweight Over-The-Air Edge Inference

Kyriakos Stylianopoulos; Mario Edoardo Pandolfo; Paolo Di Lorenzo; George C. Alexandropoulos

Metasurfaces-Integrated Wireless Neural Networks for Lightweight Over-The-Air Edge Inference

Kyriakos Stylianopoulos, Mario Edoardo Pandolfo, Paolo Di Lorenzo, George C. Alexandropoulos

TL;DR

The training of the MINN framework, two representative variations, and performance results for indicative applications are presented, highlighting the potential of MINNs as a lightweight and sustainable solution for future EI-enabled wireless systems.

Abstract

The upcoming sixth Generation (6G) of wireless networks envisions ultra-low latency and energy efficient Edge Inference (EI) for diverse Internet of Things (IoT) applications. However, traditional digital hardware for machine learning is power intensive, motivating the need for alternative computation paradigms. Over-The-Air (OTA) computation is regarded as an emerging transformative approach assigning the wireless channel to actively perform computational tasks. This article introduces the concept of Metasurfaces-Integrated Neural Networks (MINNs), a physical-layer-enabled deep learning framework that leverages programmable multi-layer metasurface structures and Multiple-Input Multiple-Output (MIMO) channels to realize computational layers in the wave propagation domain. The MINN system is conceptualized as three modules: Encoder, Channel (uncontrollable propagation features and metasurfaces), and Decoder. The first and last modules, realized respectively at the multi-antenna transmitter and receiver, consist of conventional digital or purposely designed analog Deep Neural Network (DNN) layers, and the metasurfaces responses of the Channel module are optimized alongside all modules as trainable weights. This architecture enables computation offloading into the end-to-end physical layer, flexibly among its constituent modules, achieving performance comparable to fully digital DNNs while significantly reducing power consumption. The training of the MINN framework, two representative variations, and performance results for indicative applications are presented, highlighting the potential of MINNs as a lightweight and sustainable solution for future EI-enabled wireless systems. The article is concluded with a list of open challenges and promising research directions.

Metasurfaces-Integrated Wireless Neural Networks for Lightweight Over-The-Air Edge Inference

TL;DR

Abstract

Paper Structure (11 sections, 6 figures)

This paper contains 11 sections, 6 figures.

Introduction
Deep Diffractive Neural Networks
Basic Principle
Integration in Wireless Systems
Metasurfaces-Integrated Neural Networks
Overall Architecture
Training for Static and Dynamic MS Responses
Two MINN Architecture Variations
Example MINN Applications
Open Challenges and Future Directions
Conclusion

Figures (6)

Figure 1: Conceptual architecture of the Metasurfaces-Integrated Neural Network (MINN) framework comprising three core modules. The Encoder and Decoder modules, which may incorporate neural network structures, are collocated respectively at the multi-antenna TX and RX nodes, while the Channel module performs OTA computations leveraging the fading coefficients of the wireless channel, the programmable EM responses of multi-layer MS structures constituting it, as well as the properties of the RX thermal noise. The E2E MIMO system is described as the composition of these three modules, therefore, the chain rule may be applied to compute the respective gradients and, consequently, optimize the heterogeneous DNN weights and constituent MS responses through gradient descent. It is noted that: i) the physical MS device(s) enabling the reconfigurability of the signal propagation environment may be collocated at either in the TX or RX, instead of being placed in between them (in this case, it is probable that the MS(s) affect only the portion of the signal components impinging on them), or in both; and i) the Encoder and Decoder DNN components can be implemented either through conventional digital processors or equivalent analog computation units (e.g., liquid state machines, memristors, and multi-layer MS structures, such as Stacked Intelligent Metasurfaces (SIM)).
Figure 2: Mean accuracy of different MINN versions for MNIST classification, considering fixed SNR during training and inference. All simulated $4\times4$ MIMO system setups included a SIM positioned close to the TX at a distance corresponding to approximately 7.5% of the total TX-RX distance, with its broadside perpendicular to the TX-RX line of sight, differing on the number of MS layers and the number of metamaterials per layer. A geometric channel with $10$ scatterers, yielding static fading conditions, has been considered. Two convolutional layers followed by a linear layer were used at the TX digital DNN, while the RX digital DNN comprised three feedforward layers. The "No SIM" baseline refers to performing inference with only the digital DNNs at the transceivers, i.e., without manipulating the wireless channel through any MS (a simplified MINN variation including only the uncontrollable component of the Channel module). The "Digital DNN" benchmark refers to performing MNIST classification entirely on the TX with the same number of layers, without accounting for channel transmission, and is therefore an upper bound. All networks were trained for $50$ epochs irrespective of their size. As observed, by increasing the number of SIM elements and the received SNR level, larger MINNs approach the Digital DNN bound, whereas the removal of the SIM is detrimental to the training process. It is additionally demonstrated that, under lower SNRs, deeper MINN versions are not always more efficient since they suffer from higher signal attenuation through the SIM layers.
Figure 3: Two variations of the MINN architecture for the example of MNIST handwritten digit classification: (a) An all-MSs MIMO system where all three modules are primarily implemented by MSs. At the TX, the forward network pass initiates through a single antenna illuminating the first layer of the E2E wave-domain-based DNN, which has as many elements as the number of features within the input data (i.e., the number of MNIST image pixels). This data is encoded via the EM responses of this layer's diffractive metamaterials LML23_D2NN. At the RX, the final MS layer, constituting of ten fully absorbing metamaterials each followed by an energy detector (with each representing one of the possible MNIST digits), provides the output inferred digit. Although conceptually regarded as parts of the Channel module handling all OTA computations, the hidden diffractive MS layers may be flexibly distributed at all system physical devices. For example, some can be collocated with the transceiver devices and the remaining installed all together or in groups within the signal propagation environment. Alternatively, those diffractive MSs may be placed at one of the end devices and inside the MIMO channel, or solely within that channel, enabling lightweight transmissions, receptions or both. (b) An XL MIMO system where the Encoder and Decoder modules are respectively carried out by a multi-antenna TX and RX, and the Channel module is devoid of programmable devices. Stylianopoulos_MIMO_ELM. At the TX, each pixel of the input data is encoded and fed to a distinct antenna, while, at the RX, the signal received at each antenna is first fed to a nonlinear RF component, then, it is multiplied by a controllable weight, and, finally, all weighted analog outputs are combined to provide the output inferred digit. This XL MIMO system capitalizes on the random transformations imposed by the uncontrollable Channel module on the feature signals before being superimposed at the RX antennas, operating as an OTA ELM.
Figure 4: Mean accuracy of the MINN-ELM variation in Fig. \ref{['fig:elm_mimo']} over different binary classification datasets at the received SNR level of $25$ dB. XL MIMO system setups with different numbers of TX antenna elements and MS sizes at the RX were simulated under a Rayleigh fading channel, which was treated as the random single hidden layer of the E2E wave-domain-based neural network. Each constituent metamaterial of the last MS-based layer at the RX was designed to realize a cascade of a fixed nonlinear response (thus, acting as an activation function) followed by a tunable linear response (thus, acting as a trainable weight). The MINN-ELM training took place in closed form within each channel coherence block. The number of TX antennas corresponded to the number of input features of the dataset used: $22$ for Parkinson's; $60$ (sub-sampled from $784$) for binary (even/odd) MNIST; $30$ for Wisconsin Breast Cancer Diagnosis (WBCD); and $20$ (sub-sampled from $590$) for Semiconductor Manufacturing (SECOM). The MINN-ELM classification performance has been compared with that of a fully digital ELM implementation, considering $200$ random initializations over four datasets. As observed, The MINN-ELM performs equally well to its digital counterpart in all scenarios. The performance increases as its approximation power is enhanced by increasing the number of MS elements at the RX, with the exception of the SECOM dataset which suffers from overfitting. In sufficiently XL MIMO conditions, MINN-ELM achieves near-optimal classification for all datasets, approaching their asymptotic theoretical guarantees of universal approximation.
Figure 5: Mean accuracy of MINN with power control for MNIST classification versus the TX power level, compared with MINN trained under constant power budgets. An $16 \times 8$ MIMO system setup with two different sizes of $4$-layer SIM, positioned close to the TX similar to Fig. \ref{['fig:minn-classification']}, operating under a geometric channel with $15$ scatterers, was simulated. At each channel realization, the RX position was randomly sampled, resulting overall in dynamic fading conditions. The penalty term $\gamma$ is a hyper-parameter balancing energy efficiency and classification performance. As depicted, classification with power control yields desired accuracy levels with order of magnitudes lower TX power during inference. This behavior exemplifies this MINN application for low and varying SNR levels.
...and 1 more figures

Metasurfaces-Integrated Wireless Neural Networks for Lightweight Over-The-Air Edge Inference

TL;DR

Abstract

Metasurfaces-Integrated Wireless Neural Networks for Lightweight Over-The-Air Edge Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (6)