Table of Contents
Fetching ...

Noise-Aware Training of Neuromorphic Dynamic Device Networks

Luca Manneschi, Ian T. Vidamour, Kilian D. Stenning, Charles Swindells, Guru Venkat, David Griffin, Lai Gui, Daanish Sonawala, Denis Donskikh, Dana Hariga, Susan Stepney, Will R. Branford, Jack C. Gartside, Thomas Hayward, Matthew O. A. Ellis, Eleni Vasilaki

TL;DR

The authors present a data-driven framework for optimizing networks of arbitrary dynamic systems which is robust to noise, and enables tasks such as neuroprosthetic control.

Abstract

Physical computing has the potential to enable widespread embodied intelligence by leveraging the intrinsic dynamics of complex systems for efficient sensing, processing, and interaction. While individual devices provide basic data processing capabilities, networks of interconnected devices can perform more complex and varied tasks. However, designing networks to perform dynamic tasks is challenging without physical models and accurate quantification of device noise. We propose a novel, noise-aware methodology for training device networks using Neural Stochastic Differential Equations (Neural-SDEs) as differentiable digital twins, accurately capturing the dynamics and associated stochasticity of devices with intrinsic memory. Our approach employs backpropagation through time and cascade learning, allowing networks to effectively exploit the temporal properties of physical devices. We validate our method on diverse networks of spintronic devices across temporal classification and regression benchmarks. By decoupling the training of individual device models from network training, our method reduces the required training data and provides a robust framework for programming dynamical devices without relying on analytical descriptions of their dynamics.

Noise-Aware Training of Neuromorphic Dynamic Device Networks

TL;DR

The authors present a data-driven framework for optimizing networks of arbitrary dynamic systems which is robust to noise, and enables tasks such as neuroprosthetic control.

Abstract

Physical computing has the potential to enable widespread embodied intelligence by leveraging the intrinsic dynamics of complex systems for efficient sensing, processing, and interaction. While individual devices provide basic data processing capabilities, networks of interconnected devices can perform more complex and varied tasks. However, designing networks to perform dynamic tasks is challenging without physical models and accurate quantification of device noise. We propose a novel, noise-aware methodology for training device networks using Neural Stochastic Differential Equations (Neural-SDEs) as differentiable digital twins, accurately capturing the dynamics and associated stochasticity of devices with intrinsic memory. Our approach employs backpropagation through time and cascade learning, allowing networks to effectively exploit the temporal properties of physical devices. We validate our method on diverse networks of spintronic devices across temporal classification and regression benchmarks. By decoupling the training of individual device models from network training, our method reduces the required training data and provides a robust framework for programming dynamical devices without relying on analytical descriptions of their dynamics.
Paper Structure (10 sections, 26 equations, 13 figures)

This paper contains 10 sections, 26 equations, 13 figures.

Figures (13)

  • Figure 1: Overview of the dynamical network optimisation framework. (a) Model Generation: Experimental devices are driven under random inputs, their observable states are recorded, and these data are used to fit models of device dynamics. (b) Network Simulation: A neural network is constructed where each node replicates the dynamics of the original device, using the trained model. Parameters controlling device interactions (network weights) are optimised for a task via backpropagation through time (BPTT) or truncated-BPTT on the interacting digital twins. (c) Experimental Transfer: The parameters optimised in simulation are transferred like-for-like to experimental networks where each node is a real device, and task performance evaluated
  • Figure 2: Modelling and optimising dynamic behaviours.(a) Schematic analogy of temporal dependencies. Altering an action in the past has consequences for all future actions. Similarly, for backpropagation through time, changes to the final output caused by all past inputs and states must be taken into consideration. (b) Schematic showing samples from distributions of initial conditions, which subsequently affect the predicted trajectory. Grey clouds show the distribution of all gathered data for a given random input sequence, while red lines highlight specific trajectories. (c) Schematic diagram of the Neural-SDE architecture. Inputs of device states (activities), external driving stimuli, and auxiliary variables feed into a pair of distinct neural networks that handle the deterministic (upper network) and stochastic (lower network) behaviours. The output of these networks feeds into a numerical ODE solver, generating predictions of both activities and auxiliary variables for the next timestep. The results are recursively fed back as inputs to the next timestep prediction, generating predicted trajectories from initial conditions and external driving signals. Black arrows show forward propagation of activities; orange arrows show backward propagation of gradients. (d) Comparison between predictions generated via neural-ODE and neural-SDE models. The neural-ODE produces a single deterministic outcome for a given set of initial conditions and input stimuli, shown by the yellow line. The neural-SDE instead generates sampled trajectories from a distribution based on the learned noise characteristics. The black lines show 100 generations of a signal via the neural-SDE, while red lines show real experimental data from repeated identical input sequences. As in (b), blue circles represent selected initial conditions and the grey clouds represent the distributions observed across all experiments.
  • Figure 3: Partially-observable MNIST and Neuroprosthetic movement classification tasks.(a) The MNIST data, presented as sequences of images, have been adapted into a temporal problem by partially obscuring the images at each time step, requiring the system to integrate information over time for accurate classification. The neuroprosthetic gesture recognition task is characterized by input channels that vary over time. (b) Example responses from the network’s physical nodes, showing experimentally measured responses (red) and digital twins’ responses (black) for different nodes across two layers. The gray areas represent the distribution of responses from the digital twins, while the dashed arrows illustrate the flow of information from the input through the layers to the output. The horizontal bars indicate the output activations of different physical devices representing classes compared to the model output, with the correct class highlighted in red. (c) Transferred performance of nanoring array networks using Neural-ODEs and Neural-SDEs as digital twins in the MNIST benchmark. The deterministic Neural-ODE models exhibit unrealistically high performance in simulation, which significantly deteriorates in experiments. In contrast, the noise-aware training provided by Neural-SDEs maintains high performance on physical devices, demonstrating effective exploitation of node dynamics and robustness during device transfer. (d) Performance of the Neural-SDE models on neuroprosthetic gesture recognition, demonstrating the framework’s potential in addressing real-world tasks. The black line represents the error as a percentage across iterations. The inset shows the final performance, comparing simulation results with those after transfer to the physical device.
  • Figure 4: Cascade learning and Mackey--Glass future prediction task.(a) Schematic overview of the methodology employed for sequentially training network layers with intermediate data gathering. The boxes represent steps performed in simulations, with red shading indicating ASVI twins/experiments in the first layer (L1) and blue shading representing NRAs in the second layer (L2). Initially, a single ASVI layer is connected to a simulated output neuron and trained for the regression task. Once trained, the connectivity from the input to the ASVI layer is transferred to the physical device. Experimental data is then collected to serve as input for training the connectivity to the subsequent layer, consisting of NRAs. This process can, in principle, be extended to accommodate any number of layers. Retraining the digital twin is not required; intermediate data are used solely to adjust the connectivity between the new and the previous layer. (b) Mean-squared error between ground truth and experimental network predictions for the Mackey–Glass future prediction task as the number of future steps increases. Circles/squares represent networks with two/three hidden layers, while dark/light colors compare direct training of the entire network to networks trained using cascade learning, as presented in panel (a). Comparison between model prediction and ground-truth data for the five-timestep future prediction of the Mackey–Glass equation in (c) two-layer and (d) three-layer networks. White circles represent the ground truth data, red lines show the transferred PNN prediction, and pink shading indicates the difference between the ground truth and the network prediction.
  • Figure 5: Modelling of simulated and experimental dynamical systems.Panel (a) illustrates a simulated example of a partial derivative of the acceleration ($f_2$) with respect to position for the Duffing oscillator. The surface represents the partial derivative as position ($x_1(t)$) and velocity ($x_2(t)$) of the system vary. The example trajectories compare the gradient over time, calculated both analytically (red) and via differentiation of the Neural-SDE model (black), for two input sequences, showing excellent agreement. Panel (b) provides a more general view of the model’s ability to act as a surrogate for device gradients; here, we adopt eligibility traces that accumulate gradient information (see Main text and Supplementary Information for more details). The difference between the desired and modelled eligibility traces increases due to error accumulation. Panel (c) compares responses generated via the Neural-SDE model (black and yellow lines) and experimentally gathered data of the NRA device (red lines, white circles) for 100 repetitions of a random input sequence. Panel (d) illustrates the Neural-SDE's ability to model the high-dimensional, experimentally measured responses of an artificial spin-vortex-ices (ASVI) device. Here, the x-axis corresponds to the different output dimensions of the device responses, while the colours reflect the temporal evolution. Even for this multivariate system, the model (coloured lines) accurately captures the system behaviour (dots).
  • ...and 8 more figures