ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks
Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki
TL;DR
ReLaX-Net addresses the parameter-scale gap in Physical Neural Networks by reusing layers through time-multiplexing, implemented with fast switches to periodically switch among a small set of trainable weight matrices. The method yields an intermediate-scale, hardware-efficient architecture that bridges stateless RNNs and fully dynamic DNNs, and it is evaluated on SVHN image classification and Shakespeare NLP tasks. Results show performance gains over the baseline RNN, with the best outcomes arising from a balanced choice of the number of weight sets $L_{\rm{W}}$ and repetition length $L_{\rm{T}}$ under fixed parameter budgets; NLP results reveal some limits due to gradient issues at higher $L_{\rm{W}}$. The work highlights a practical pathway to scalable, energy-efficient PNNs by leveraging layer reuse and time-multiplexed computation, with future directions including experimental validation on photonic and spintronic platforms and optimisation of switching schemes.
Abstract
Physical Neural Networks (PNN) are promising platforms for next-generation computing systems. However, recent advances in digital neural network performance are largely driven by the rapid growth in the number of trainable parameters and, so far, demonstrated PNNs are lagging behind by several orders of magnitude in terms of scale. This mirrors size and performance constraints found in early digital neural networks. In that period, efficient reuse of parameters contributed to the development of parameter-efficient architectures such as convolutional neural networks. In this work, we numerically investigate hardware-friendly weight-tying for PNNs. Crucially, with many PNN systems, there is a time-scale separation between the fast dynamic active elements of the forward pass and the only slowly trainable elements implementing weights and biases. With this in mind,we propose the Reuse of Layers for eXpanding a Neural Network (ReLaX-Net) architecture, which employs a simple layer-by-layer time-multiplexing scheme to increase the effective network depth and efficiently use the number of parameters. We only require the addition of fast switches for existing PNNs. We validate ReLaX-Nets via numerical experiments on image classification and natural language processing tasks. Our results show that ReLaX-Net improves computational performance with only minor modifications to a conventional PNN. We observe a favorable scaling, where ReLaX-Nets exceed the performance of equivalent traditional RNNs or DNNs with the same number of parameters.
