Table of Contents
Fetching ...

Training of Physical Neural Networks

Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Marković, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu, Demetri Psaltis, Andrea Alù, Romain Fleury

TL;DR

Methods to train physical neural networks, such as backpropagation-based and backpropagation-free approaches, are explored to allow scaling up of artificial intelligence models far beyond present small-scale laboratory demonstrations, potentially enhancing computational efficiency.

Abstract

Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.

Training of Physical Neural Networks

TL;DR

Methods to train physical neural networks, such as backpropagation-based and backpropagation-free approaches, are explored to allow scaling up of artificial intelligence models far beyond present small-scale laboratory demonstrations, potentially enhancing computational efficiency.

Abstract

Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Figures (4)

  • Figure 1: a Physical Neural Networks (PNNs), processing input data $\vec{x}$ using trainable parameters $\vec{\theta}$. PNNs can be constructed to realize computations isomorphic to those commonly found in artificial neural networks, such as matrix-vector multiplications, or can sacrifice isomorphism for potential speed/energy advantages, where the physical system is left to perform the computation it most naturally performs. b Timeline of training methods for PNNs. The corresponding references of selected key milestones and publications from left to right: lugt1964signalgoodman1984opticalhopfield1982neuralfarhat1985opticalhinton1986learninghopfield1986computingpsaltis1988adaptivemead1987neuralmead1990neuromorphichasler1994singlebi1998synapticsong2000competitiveindiveri1999neuromorphicjaeger2004harnessingmaass2004computationalindiveri2011neuromorphiclarger2012photonicpaquot2012optoelectronicnokland2016directlaunay2020hardwarescellier2017equilibriumshen2017deeplin2018allromera2018vowelfeldmann2021parallelstern2021superviseddillavou2022demonstrationwright2022deeponodera2024scalinghinton2022forwardpai2023experimentallymomeni2023backpropagationoguz2023forwardle202364lopez2023selfWanjura2023fully.
  • Figure 2: Training methods for PNNs. ELM: Extreme Learning Machine, RC: Reservoir Computing, DFA: Direct Feedback Alignment, EP: Equilibrium Propagation, HEB: Hamiltonian Echo Backpropagation. For a more detailed comparison, refer to Table 1.
  • Figure 3: Analog large models (a) The building block of the mainstream large language models is the transformer architecture vaswani2017attention, whose main building blocks are the attention, multilayer perceptron (MLP) layers, softmax operation and dynamic matrix-vector multiplication (MatMul). The attention layer requires a causal pairwise computations between the elements in the sequence, resulting in a quadratic increase in computational complexity with respect to sequence length, affecting both time and energy overhead, especially as models process longer context lengths. The MLP layer includes very large weight matrices that also impose a large computational overhead; (b) MLP is the architecture of vector-matrix multiplication also known as a fully connected layer. The MLP can be experimentally realized on a number of technologies such as (c) crossbar arrays le202364; (d) Mach-Zehnder Interferometer meshes pai2023experimentally; (e) free-space multipliers anderson2023optical; (f) size scaling of two- and three-dimensional analog models with increasing model parameters computed at wavelength=500 nm with scalings $\lambda^{2/3}$; (g) Energy scaling advantage of analog optical matrix-vector multiplication compared to digital electronics, for Transformer models. Data were obtained from anderson2023optical.
  • Figure 4: Emergent technologies. (a) Optical microscopic image of a memristor crossbar array integrated on the memristor/CMOS chip, from cai2019fully; (b) crossbar array of magnetic tunnel junctions for high-density storage and memory retrieval, from grollier2020neuromorphic; (c) the schematics of in-sensor computing architecture, from zhou2020near; (d) an operational principle of learned-sensing intelligent meta-imagers, from saigre2022intelligent; (e) an illustration of soft quantum neurons in the quantum circuit model, from zhou2023quantum; (f) the schematics of spatial photonic Ising machine, from pierangeli2020noise; (g) diffraction optical NNs consisting of multiple transmissive or reflective layers, where each point on a given layer acts as a neuron, with a complex-valued transmission or reflection coefficient, from lin2018all; (h) superradiance in confocal cavity QED for high-density storage and memory retrieval, from marsh2021enhancing; (i) an image of a PIC with shown signal paths (white) and the local oscillator paths (blue), from bandyopadhyay2022single; (j) the structure of an N-input photonic neuron with weights of input signals changed using optical PIN attenuators and summed up using photodetectors, from ashtiani2022chip; (k) acoustic data transformer where input data are encoded into the intensity of sound waves at different frequencies that propagate through a random set of membranes, from momeni2023backpropagationmomeni2023physics.