Table of Contents
Fetching ...

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang

TL;DR

This work demonstrates that a CMOS Ising-machine substrate, when co-designed with algorithms for energy-based models, can accelerate RBM training and inference. It introduces two designs: a Gibbs-sampler accelerator and a Boltzmann gradient follower that enables in-hardware gradient updates, achieving up to $29\times$ speedup and $1000\times$ energy savings relative to a TPU on RBM benchmarks. The results show robust performance under realistic noise and process variations, validating the practicality of nature-inspired hardware for energy-efficient learning. The study highlights the potential of hardware-software co-design to unlock efficient, specialized accelerators for EBMs, while noting the tradeoffs in generality and scalability compared to fully general-purpose architectures.

Abstract

Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

TL;DR

This work demonstrates that a CMOS Ising-machine substrate, when co-designed with algorithms for energy-based models, can accelerate RBM training and inference. It introduces two designs: a Gibbs-sampler accelerator and a Boltzmann gradient follower that enables in-hardware gradient updates, achieving up to speedup and energy savings relative to a TPU on RBM benchmarks. The results show robust performance under realistic noise and process variations, validating the practicality of nature-inspired hardware for energy-efficient learning. The study highlights the potential of hardware-software co-design to unlock efficient, specialized accelerators for EBMs, while noting the tradeoffs in generality and scalability compared to fully general-purpose architectures.

Abstract

Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.
Paper Structure (31 sections, 12 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 12 equations, 16 figures, 5 tables, 1 algorithm.

Figures (16)

  • Figure 1: Restricted Boltzmann machine (RBM)
  • Figure 2: High-level BRIM showing bistable capacitative nodes with programmable resistive coupling, and its programming logic. Note that between every pair of nodes (say, $N_1$ and $N_2$), we only show one bi-directional coupling units ($CU_{1,2}$), resulting in an upper triangular coupling network. In an equivalent implementation, the coupling unit may consist of two uni-directional parts, forming a symmetric layout.
  • Figure 3: High-level RBM showing visible and hidden nodes, with clamping units to drive node biases, coupling mesh, and programming logic.
  • Figure 4: An architecture diagram of Boltzmann gradient follower.
  • Figure 5: Execution time normalized to that of BGF for different RBMs and image batch size of 500.
  • ...and 11 more figures