Table of Contents
Fetching ...

Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware Training

Tengji Xu, Zeyu Luo, Shaojie Liu, Li Fan, Qiarong Xiao, Benshan Wang, Dongliang Wang, Chaoran Huang

TL;DR

This work offers a transformative, efficient approach to training PNNs, addressing critical challenges in analog computing and enabling real-world deployment through a novel technique called Sharpness-Aware Training (SAT), which is universally applicable across three types of PNNs.

Abstract

AI models are essential in science and engineering, but recent advances are pushing the limits of traditional digital hardware. To address these limitations, physical neural networks (PNNs), which use physical substrates for computation, have gained increasing attention. However, developing effective training methods for PNNs remains a significant challenge. Current approaches, regardless of offline and online training, suffer from significant accuracy loss. Offline training is hindered by imprecise modeling, while online training yields device-specific models that can't be transferred to other devices due to manufacturing variances. Both methods face challenges from perturbations after deployment, such as thermal drift or alignment errors, which make trained models invalid and require retraining. Here, we address the challenges with both offline and online training through a novel technique called Sharpness-Aware Training (SAT), where we innovatively leverage the geometry of the loss landscape to tackle the problems in training physical systems. SAT enables accurate training using efficient backpropagation algorithms, even with imprecise models. PNNs trained by SAT offline even outperform those trained online, despite modeling and fabrication errors. SAT also overcomes online training limitations by enabling reliable transfer of models between devices. Finally, SAT is highly resilient to perturbations after deployment, allowing PNNs to continuously operate accurately under perturbations without retraining. We demonstrate SAT across three types of PNNs, showing it is universally applicable, regardless of whether the models are explicitly known. This work offers a transformative, efficient approach to training PNNs, addressing critical challenges in analog computing and enabling real-world deployment.

Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware Training

TL;DR

This work offers a transformative, efficient approach to training PNNs, addressing critical challenges in analog computing and enabling real-world deployment through a novel technique called Sharpness-Aware Training (SAT), which is universally applicable across three types of PNNs.

Abstract

AI models are essential in science and engineering, but recent advances are pushing the limits of traditional digital hardware. To address these limitations, physical neural networks (PNNs), which use physical substrates for computation, have gained increasing attention. However, developing effective training methods for PNNs remains a significant challenge. Current approaches, regardless of offline and online training, suffer from significant accuracy loss. Offline training is hindered by imprecise modeling, while online training yields device-specific models that can't be transferred to other devices due to manufacturing variances. Both methods face challenges from perturbations after deployment, such as thermal drift or alignment errors, which make trained models invalid and require retraining. Here, we address the challenges with both offline and online training through a novel technique called Sharpness-Aware Training (SAT), where we innovatively leverage the geometry of the loss landscape to tackle the problems in training physical systems. SAT enables accurate training using efficient backpropagation algorithms, even with imprecise models. PNNs trained by SAT offline even outperform those trained online, despite modeling and fabrication errors. SAT also overcomes online training limitations by enabling reliable transfer of models between devices. Finally, SAT is highly resilient to perturbations after deployment, allowing PNNs to continuously operate accurately under perturbations without retraining. We demonstrate SAT across three types of PNNs, showing it is universally applicable, regardless of whether the models are explicitly known. This work offers a transformative, efficient approach to training PNNs, addressing critical challenges in analog computing and enabling real-world deployment.

Paper Structure

This paper contains 12 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: The detailed comparison of physical neural network training methods and the proposed sharpness aware training method’s principle. (a) Diagram illustrating a typical neural network. The neural network contains synapses to perform matrix-vector multiplications followed by nonlinear activation functions. (b) Schematic diagram of a programmable physical system with control parameters $\Theta$ and tunable weights $W$. The tunable weights $W$ are directly controlled by the control parameters $\Theta$. The system takes input signals and produces output signals based on the adjusted parameters. (c) Illustration of offline training and online training methods. The tunable weights are achieved by separately controlling different physical parameters including currents and phases. (d) Proposed Sharpness-Aware Training scheme. The training goals include reducing loss while increasing component and system stability, respectively. (e) Schematic diagram of the proposed sharpness aware training scheme parameter update. (f) Performance comparison between offline training, online training, and Sharpness-Aware training. MRR: Micro-ring resonator. MZI: Mach-Zehnder Interferometer. NN: Neural network. FP: Forward propagation. BP: Backpropagation.
  • Figure 2: Experimental results in MRR-based PNNs. (a) Implement matrix multiplication on MRR weight bank. (b) Picture of the manufactured MRR weight bank. (c) Single MRR spectrum changes through thermal tuning. (d) Theoretical MRR tuning curve model and experimental measured MRR tuning curves. (e) Loss and classification change verse training epochs. (f) Current distribution contrast histogram. (g) Trained model loss landscapes. (h) Experimental inference accuracy with temperature change from 21°C to 23°C. (j) Experimental measured inference accuracy without using TEC. MZM: Mach-Zehnder modulator. MRR: Micro-ring resonator. SOI: Silicon on insulator. PD: Photodetector. BP: Backpropagation. SAT: Sharpness-Aware Training. TEC: Thermoelectric control.
  • Figure 3: Experimental results in diffractive optics-based NNs. (a) Decompose neural network inference into multiple matrix multiplication. (b) Deploy matrix multiplication in free-space optical computing systems. (c) Schematic diagram and picture of the experimental setup. (d) Detailed SAT optimization process for non-differentiable parameters. (e-i) Experimental inference accuracy with rotation angle change from 0.0° to 2.0°. (e-ii) Experimental inference accuracy with shift pixel number change from 0.0 to 2.0. (e-iii) Experimental inference accuracy with scaling number change from 1.0 to 1.1. OLED: Organic light-emitting diode. SLM: Spatial light modulator. MS: Microscope. PBS: Polarization beam splitter. HW: Half-wave plate. SAT: Sharpness-Aware Training. BP: Backpropagation.
  • Figure 4: Simulation results in MZI-based PNNs. (a) Schematic diagram of digital-optical hybrid network and illustration of manufacturing error. (b) Schematic diagram for implementing SAT on online training. (c) Training result performance comparison. (d) Inference performance comparison when transferring trained parameters to different error variance devices. MZI: Mach-Zehnder interferometer. PNN: Photonic neural network. SOI: Silicon-on-insulator. PD: Photodetector. SAT: Sharpness-Aware training. PAT: Physical-aware training. DAT: Dual-adaptive training. BS: Beam splitter.