Table of Contents
Fetching ...

Training deep physical neural networks with local physical information bottleneck

Hao Wang, Ziao Wang, Xiangpeng Liang, Han Zhao, Jianqi Hu, Junjie Jiang, Xing Fu, Jianshi Tang, Huaqiang Wu, Sylvain Gigan, Qiang Liu

TL;DR

The Physical Information Bottleneck is presented, a general and efficient framework that integrates information theory and local learning, enabling deep PNNs to learn under arbitrary physical dynamics and demonstrate supervised, unsupervised, and reinforcement learning across electronic memristive chips and optical computing platforms.

Abstract

Deep learning has revolutionized modern society but faces growing energy and latency constraints. Deep physical neural networks (PNNs) are interconnected computing systems that directly exploit analog dynamics for energy-efficient, ultrafast AI execution. Realizing this potential, however, requires universal training methods tailored to physical intricacies. Here, we present the Physical Information Bottleneck (PIB), a general and efficient framework that integrates information theory and local learning, enabling deep PNNs to learn under arbitrary physical dynamics. By allocating matrix-based information bottlenecks to each unit, we demonstrate supervised, unsupervised, and reinforcement learning across electronic memristive chips and optical computing platforms. PIB also adapts to severe hardware faults and allows for parallel training via geographically distributed resources. Bypassing auxiliary digital models and contrastive measurements, PIB recasts PNN training as an intrinsic, scalable information-theoretic process compatible with diverse physical substrates.

Training deep physical neural networks with local physical information bottleneck

TL;DR

The Physical Information Bottleneck is presented, a general and efficient framework that integrates information theory and local learning, enabling deep PNNs to learn under arbitrary physical dynamics and demonstrate supervised, unsupervised, and reinforcement learning across electronic memristive chips and optical computing platforms.

Abstract

Deep learning has revolutionized modern society but faces growing energy and latency constraints. Deep physical neural networks (PNNs) are interconnected computing systems that directly exploit analog dynamics for energy-efficient, ultrafast AI execution. Realizing this potential, however, requires universal training methods tailored to physical intricacies. Here, we present the Physical Information Bottleneck (PIB), a general and efficient framework that integrates information theory and local learning, enabling deep PNNs to learn under arbitrary physical dynamics. By allocating matrix-based information bottlenecks to each unit, we demonstrate supervised, unsupervised, and reinforcement learning across electronic memristive chips and optical computing platforms. PIB also adapts to severe hardware faults and allows for parallel training via geographically distributed resources. Bypassing auxiliary digital models and contrastive measurements, PIB recasts PNN training as an intrinsic, scalable information-theoretic process compatible with diverse physical substrates.
Paper Structure (3 sections, 4 figures)

This paper contains 3 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of deep physical neural networks (PNNs) trained with physical information bottleneck (PIB). (A) A physical computing unit leverages intrinsic analog physical dynamics (indicated by illustrative trajectories) to transform an input state to an output state. These units can be built on various physical substrates, such as electronics and optics. They can be categorized into two classes based on their mathematical abstraction: isomorphic units that are designed to emulate explicit and reconfigurable mathematical operators, and broken-isomorphism (or non-isomorphic) units that exploit native, fixed physical dynamics augmented by a trainable layer. (B) Similar to the hierarchical architecture of digital deep neural networks (DNNs), deep PNNs are constructed by cascading individual physical computing units. (C) To achieve the desired functionalities, each unit is optimized locally via the PIB objective. This framework promotes the output feature of each unit to retain task-relevant information (e.g., for target prediction) while filtering out task-irrelevant information from the original input. The PIB training dynamics are schematically illustrated by the information Venn diagram (right), where the varying orange regions denote the matrix-based mutual information shared between the learned feature, the input, and the target.
  • Figure 2: Deep PNNs with isomorphic computations based on electronic memristors. (A) The unit implements matrix-vector multiplication (MVM) via memristive crossbars. Panels show the packaged system, chip micrograph, and cross-sectional transmission electron microscopy image of one-transistor-one-resistor cells. The histogram shows shot-to-shot noise at 23.4 $\mu$S. (B) PIB training workflow. (Right) Different layers are used for datasets of varying complexity (visualized via UMAP). (Left) PIB training cycle includes local optimization, experimental deployment, and the forward pass to subsequent units. (C) Training dynamics showing weight evolution (top) and UMAP feature separation (bottom) over iterations. (D) System performance. (Left) Experimental PIB accuracy compared with in silico training benchmarks (mean $\pm$ s.t.d. across six runs). (Right) Demonstration of PIB's adaptation to severe hardware faults (broken analog-to-digital converter).
  • Figure 3: Deep PNNs with broken-isomorphism computations based on optics. (A) An optical processor maps encoded input fields to speckle features through multiple scattering. This unknown transformation breaks gradient flow, preventing standard global backpropagation. PIB optimizes each unit with a local objective, enabling unit-wise training without transmission-matrix retrieval or differentiable surrogates. (B) Training dynamics on Fashion-MNIST. Information plane trajectories exhibit a "fit-then-compress" behavior ($I(Y;Z)\uparrow$ then $I(X;Z)\downarrow$). (Bottom) Readout training curves (x-axis: Epoch) show that increasing PIB iterations (color gradient) accelerates convergence and improves final accuracy. (C) PIB improves generalization over gradient-based (PAT), error-transport (DFA), and contrastive (PhyLL) baselines in noise robustness, data efficiency, and Out-of-Distribution (OOD) detection. Shaded regions denote multiple runs.
  • Figure 4: Versatility of the PIB framework in unsupervised learning, reinforcement learning, and decentralized parallel training regimes. (A) A memristor-based deep PNN is trained by PIB's unsupervised learning variant. Increasing PIB training iterations enhance feature's linear separability as shown by the improved accuracy of a downstream linear probe. (Inset) Semantically disentangled UMAP visualization. (B) Reinforcement learning on an optics-based deep PNN agent. The agent learns the CartPole-v1 control task via a local loss, a distinct departure from standard RL. By minimizing a composite local objective of PIB and temporal-difference Q-learning loss, episode return (Score) and its running average (Avg) increase effectively to reach the solved regime. (C) Decentralized parallel training of a hybrid optical-electronic deep PNN. Physical computing units are geographically distributed across two sites, and PIB allows for simultaneous, decoupled optimization using local computing resource. (Right) Parallel training loss curves for both units demonstrate convergence of respective PIB objectives without inter-site communication or synchronization. (Bottom) Downstream task results after parallel training.