Table of Contents
Fetching ...

ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks

Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki

TL;DR

ReLaX-Net addresses the parameter-scale gap in Physical Neural Networks by reusing layers through time-multiplexing, implemented with fast switches to periodically switch among a small set of trainable weight matrices. The method yields an intermediate-scale, hardware-efficient architecture that bridges stateless RNNs and fully dynamic DNNs, and it is evaluated on SVHN image classification and Shakespeare NLP tasks. Results show performance gains over the baseline RNN, with the best outcomes arising from a balanced choice of the number of weight sets $L_{\rm{W}}$ and repetition length $L_{\rm{T}}$ under fixed parameter budgets; NLP results reveal some limits due to gradient issues at higher $L_{\rm{W}}$. The work highlights a practical pathway to scalable, energy-efficient PNNs by leveraging layer reuse and time-multiplexed computation, with future directions including experimental validation on photonic and spintronic platforms and optimisation of switching schemes.

Abstract

Physical Neural Networks (PNN) are promising platforms for next-generation computing systems. However, recent advances in digital neural network performance are largely driven by the rapid growth in the number of trainable parameters and, so far, demonstrated PNNs are lagging behind by several orders of magnitude in terms of scale. This mirrors size and performance constraints found in early digital neural networks. In that period, efficient reuse of parameters contributed to the development of parameter-efficient architectures such as convolutional neural networks. In this work, we numerically investigate hardware-friendly weight-tying for PNNs. Crucially, with many PNN systems, there is a time-scale separation between the fast dynamic active elements of the forward pass and the only slowly trainable elements implementing weights and biases. With this in mind,we propose the Reuse of Layers for eXpanding a Neural Network (ReLaX-Net) architecture, which employs a simple layer-by-layer time-multiplexing scheme to increase the effective network depth and efficiently use the number of parameters. We only require the addition of fast switches for existing PNNs. We validate ReLaX-Nets via numerical experiments on image classification and natural language processing tasks. Our results show that ReLaX-Net improves computational performance with only minor modifications to a conventional PNN. We observe a favorable scaling, where ReLaX-Nets exceed the performance of equivalent traditional RNNs or DNNs with the same number of parameters.

ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks

TL;DR

ReLaX-Net addresses the parameter-scale gap in Physical Neural Networks by reusing layers through time-multiplexing, implemented with fast switches to periodically switch among a small set of trainable weight matrices. The method yields an intermediate-scale, hardware-efficient architecture that bridges stateless RNNs and fully dynamic DNNs, and it is evaluated on SVHN image classification and Shakespeare NLP tasks. Results show performance gains over the baseline RNN, with the best outcomes arising from a balanced choice of the number of weight sets and repetition length under fixed parameter budgets; NLP results reveal some limits due to gradient issues at higher . The work highlights a practical pathway to scalable, energy-efficient PNNs by leveraging layer reuse and time-multiplexed computation, with future directions including experimental validation on photonic and spintronic platforms and optimisation of switching schemes.

Abstract

Physical Neural Networks (PNN) are promising platforms for next-generation computing systems. However, recent advances in digital neural network performance are largely driven by the rapid growth in the number of trainable parameters and, so far, demonstrated PNNs are lagging behind by several orders of magnitude in terms of scale. This mirrors size and performance constraints found in early digital neural networks. In that period, efficient reuse of parameters contributed to the development of parameter-efficient architectures such as convolutional neural networks. In this work, we numerically investigate hardware-friendly weight-tying for PNNs. Crucially, with many PNN systems, there is a time-scale separation between the fast dynamic active elements of the forward pass and the only slowly trainable elements implementing weights and biases. With this in mind,we propose the Reuse of Layers for eXpanding a Neural Network (ReLaX-Net) architecture, which employs a simple layer-by-layer time-multiplexing scheme to increase the effective network depth and efficiently use the number of parameters. We only require the addition of fast switches for existing PNNs. We validate ReLaX-Nets via numerical experiments on image classification and natural language processing tasks. Our results show that ReLaX-Net improves computational performance with only minor modifications to a conventional PNN. We observe a favorable scaling, where ReLaX-Nets exceed the performance of equivalent traditional RNNs or DNNs with the same number of parameters.

Paper Structure

This paper contains 37 sections, 20 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Parameter growth of machine learning models on Electronic Computers and Physical Neural Networks. On conventional computers, the model sizes (AlexNet krizhevsky2012imagenet,VGG16 simonyan2014very,GPT-1 radford2018improving,BERT-Large devlin2019bert,Megatron-LM shoeybi2019megatron,GPT-2 radford2019language,T5-11B raffel2020exploring,GPT-3 brown2020language,Switch Transformer fedus2022switch, PaLM chowdhery2023palm) grow rapidly. On the other hand, the PNN models (Optoelectric Reservoir Computing (RC) appeltant2011information, Photonic Integrated (PI) RC vandoorne2014experimental, Nanophotonic Neural Network (NN) shen2017deep, Spin-Torque Oscillator RC torrejon2017neuromorphic, Micro-Ring Resonator (MRR) Recurrent Neural Network (RNN) tait2017neuromorphic, Coupled Spin Oscillator Network romera2018vowel, Photonic SNN (Spiking Neurosynaptic Network) feldmann2019all, Photonic RC bueno2018reinforcement, Memristor yao2020fully, Diffractive Deep Neural Network (D${}^2$NN) lin2018all, Chemical Autoencoder parrilla2020programmable, Mars ramey2020silicon, MEMS sun2021novel, Electrochemical RC kan2022physical, Magnetoresistive Random-Access Memory (MRAM) Crossbar Array jung2022crossbar, Physical Aware Training (PAT) wright2022deep and Degenerate Optical Parametric Oscillator (DOPO) Network inagaki2021collective) also grow gradually, but there is a huge gap between Electronic Computers and Physical Neural Networks. See Supplemental Information Section S1 for details.
  • Figure 2: Concepts of "time multiplexing". $\mathbf{a}$: While the state $h_l$ in a conventional network evolves when passing through distinct layers $l$, in a time-multiplexed system the state $h[t]$ evolves over several time steps $t$ (see Eq. \ref{['eq:time-mutliplexed-DNN']}). $\mathbf{b}$: By reusing the same nonlinear element, but reconfiguring the weights network weights $W_{\rm{hh}}[t]$ (and biases $b[t]$) at each step $t$, we can reproduce the dynamics of a standard DNN (Eq. \ref{['eq:time-mutliplexed-DNN']}). $\mathbf{c}$: If the matrix $W_{\rm{hh}}$ (and bias) is fixed (independent of $t$), the system implements a standard RNN (Eq. \ref{['eq:RNN']}). $\mathbf{d}$: The network weight $W_{\rm{hh}}$ and biases $b[t]$ are switched periodically at every time step for the ReLaX-net architecture proposed in Sec. \ref{['sec:ReLaX-Net']}.
  • Figure 3: Reuse of Layers for eXpanding a Neural Network (ReLaX-Net). a: The forward propagation switches between separate sets of hidden layers with different parameters $\theta=(W_{\rm{hh}},b_{\rm{h}})$, see Eq. \ref{['eq:relax_net']}. This enables the hardware-efficient reuse of many components while only requiring high-speed switches (parameter reconfiguration can happen at much slower speeds). b: If we consider how layers are chained temporally, this forms a deep neural network. The total depth is given by the number of repetitions $L_{\rm{T}}$. c, d, e: Examples of proposed implementations for PNN components: c: Micro-Ring Resonator (MRR), Mach-Zehnder Interferometer (MZI) and Magnetic Tunnel Junctions (MTJ) Crossbar Arrays for trainable parameters; d: Digital Mirror Devices (DMD), Fiber Switches and Current Switches for switching devices ; e: Photo Detectors or Electrical Processing for the nonlinear activation function.
  • Figure 4: Examples of ReLaX-Net. From left to right, the cases are: $(L_{\rm{W}}, L_{\rm{T}})= (1,2), (2,4), (4,7)$. The gray solid line represents the hidden states, while the colored arrows inbetween represent the parameters of the transformations (weights and biases). Shared colors indicate that parameters were reused, whereas differing colors indicate unique parameters. $L_{\rm{T}}$ corresponds to the number of layers and $L_{\rm{W}}$ corresponds to the number of distinct colors of the arrows, i.e., parameter sets.
  • Figure 5: ReLaX-Net performance on the SVHN image classification task. The number of unique weight matrices $L_{\rm{W}}$ is changed while keeping the number of layers $L_{\rm{T}}=12$ fixed. $\mathbf{a}$: Error Rate and Parameter Footprint when varying $L_{\rm{W}}$. Blue dots and lines show the Error Rate, and orange bars show the number of trainable parameters in the hidden layers, corresponding to the hardware footprint. The dashed line is the performance of the stateless RNN limit for $L_{\rm{W}} = 1$. $\mathbf{b}$: Illustration of ReLaX-Net architectures. Gray bars represent the hidden layers, and arrows indicate hidden weights. (Left) Reusing the same hidden weight. ($L_{\rm{W}}=1$) (Middle) Repeating two-weights. ($L_{\rm{W}}=2$) (Right) Reconfiguring at every time step. ($L_{\rm{W}}=12$) ReLaX-Net $L_{\rm{W}}\geq2$ cases are better than standard RNN $L_{\rm{W}}=1$ cases. Especially, with a few additional layers, ReLaX-Net can outperform standard RNNs.
  • ...and 7 more figures