Table of Contents
Fetching ...

pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization

Kiran Magar, Shreya Bharathan, Utsav Banerjee

TL;DR

This work addresses scalable combinatorial optimization by implementing a 2048-p-bit probabilistic computing accelerator (pc-COP) on a Xilinx UltraScale+ FPGA. It introduces a logarithmic adder tree for fast sum-of-products, an approximate yet accurate activation function, and a pseudo-parallel speculate-and-select p-bit update to accelerate convergence. The design achieves near-$99\%$ average accuracy on G-Set max-cut benchmarks up to 2000 nodes, with competitive resource usage compared to prior FPGA approaches, and demonstrates practical throughput (milliseconds per instance) at 100 MHz. Overall, the results validate FPGA-based probabilistic computing as a viable, room-temperature, quantum-inspired approach for large-scale COP solvers and motivate extensions to larger graphs and other COPs.

Abstract

Probabilistic computing is an emerging quantum-inspired computing paradigm capable of solving combinatorial optimization and various other classes of computationally hard problems. In this work, we present pc-COP, an efficient and configurable probabilistic computing hardware accelerator with 2048 fully connected probabilistic bits (p-bits) implemented on Xilinx UltraScale+ FPGA. We propose a pseudo-parallel p-bit update architecture with speculate-and-select logic which improves overall performance by $4 \times$ compared to the traditional sequential p-bit update. Using our FPGA-based accelerator, we demonstrate the standard G-Set graph maximum cut benchmarks with near-99% average accuracy. Compared to state-of-the-art hardware implementations, we achieve similar performance and accuracy with lower FPGA resource utilization.

pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization

TL;DR

This work addresses scalable combinatorial optimization by implementing a 2048-p-bit probabilistic computing accelerator (pc-COP) on a Xilinx UltraScale+ FPGA. It introduces a logarithmic adder tree for fast sum-of-products, an approximate yet accurate activation function, and a pseudo-parallel speculate-and-select p-bit update to accelerate convergence. The design achieves near- average accuracy on G-Set max-cut benchmarks up to 2000 nodes, with competitive resource usage compared to prior FPGA approaches, and demonstrates practical throughput (milliseconds per instance) at 100 MHz. Overall, the results validate FPGA-based probabilistic computing as a viable, room-temperature, quantum-inspired approach for large-scale COP solvers and motivate extensions to larger graphs and other COPs.

Abstract

Probabilistic computing is an emerging quantum-inspired computing paradigm capable of solving combinatorial optimization and various other classes of computationally hard problems. In this work, we present pc-COP, an efficient and configurable probabilistic computing hardware accelerator with 2048 fully connected probabilistic bits (p-bits) implemented on Xilinx UltraScale+ FPGA. We propose a pseudo-parallel p-bit update architecture with speculate-and-select logic which improves overall performance by compared to the traditional sequential p-bit update. Using our FPGA-based accelerator, we demonstrate the standard G-Set graph maximum cut benchmarks with near-99% average accuracy. Compared to state-of-the-art hardware implementations, we achieve similar performance and accuracy with lower FPGA resource utilization.

Paper Structure

This paper contains 11 sections, 4 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Three computing paradigms: classical, probabilistic and quantum chowdhury_fullstack_2023.
  • Figure 2: Overview of p-bit operation as a binary stochastic neuron jain_tyche_2023.
  • Figure 3: Top-level architecture of the proposed pc-COP accelerator.
  • Figure 4: Logarithmic adder tree and multiplier circuits for p-bit weight logic.
  • Figure 5: Implementations of the activation function: (a) lookup table-based $tanh$ and $2 \times sigmoid - 1$ (threshold $T = 4$), and (b) piece-wise linear approximations $A_1$, $A_2$ and $A_4$ (threshold $T$ = 1, 2 and 4 respectively).
  • ...and 8 more figures