Table of Contents
Fetching ...

Energy-Efficient p-Bit-Based Fully-Connected Quantum-Inspired Simulated Annealer with Dual BRAM Architecture

Naoya Onizawa, Taiga Kubuta, Duckgyu Shin, Takahiro Hanyu

TL;DR

An energy-efficient FPGA architecture for stochastic simulated quantum annealing (SSQA) that combines a spin-serial and replica-parallel update schedule with a dual-BRAM delay-line architecture, enabling scalable support for fully connected Ising models while eliminating fan-out growth in logic resources.

Abstract

Probabilistic bits (p-bits) offer an energy-efficient hardware abstraction for stochastic optimization; however, existing p-bit-based simulated annealing accelerators suffer from poor scalability and limited support for fully connected graphs due to fan-out and memory overhead. This paper presents an energy-efficient FPGA architecture for stochastic simulated quantum annealing (SSQA) that addresses these challenges. The proposed design combines a spin-serial and replica-parallel update schedule with a dual-BRAM delay-line architecture, enabling scalable support for fully connected Ising models while eliminating fan-out growth in logic resources. By exploiting SSQA, the architecture achieves fast convergence using only final replica states, significantly reducing memory requirements compared to conventional p-bit-based annealers. Implemented on a Xilinx ZC706 FPGA, the proposed system solves an 800-node MAX-CUT benchmark and achieves up to 50% reduction in energy consumption and over 90\% reduction in logic resources compared with prior FPGA-based p-bit annealing architectures. These results demonstrate the practicality of quantum-inspired, p-bit-based annealing hardware for large-scale combinatorial optimization under strict energy and resource constraints.

Energy-Efficient p-Bit-Based Fully-Connected Quantum-Inspired Simulated Annealer with Dual BRAM Architecture

TL;DR

An energy-efficient FPGA architecture for stochastic simulated quantum annealing (SSQA) that combines a spin-serial and replica-parallel update schedule with a dual-BRAM delay-line architecture, enabling scalable support for fully connected Ising models while eliminating fan-out growth in logic resources.

Abstract

Probabilistic bits (p-bits) offer an energy-efficient hardware abstraction for stochastic optimization; however, existing p-bit-based simulated annealing accelerators suffer from poor scalability and limited support for fully connected graphs due to fan-out and memory overhead. This paper presents an energy-efficient FPGA architecture for stochastic simulated quantum annealing (SSQA) that addresses these challenges. The proposed design combines a spin-serial and replica-parallel update schedule with a dual-BRAM delay-line architecture, enabling scalable support for fully connected Ising models while eliminating fan-out growth in logic resources. By exploiting SSQA, the architecture achieves fast convergence using only final replica states, significantly reducing memory requirements compared to conventional p-bit-based annealers. Implemented on a Xilinx ZC706 FPGA, the proposed system solves an 800-node MAX-CUT benchmark and achieves up to 50% reduction in energy consumption and over 90\% reduction in logic resources compared with prior FPGA-based p-bit annealing architectures. These results demonstrate the practicality of quantum-inspired, p-bit-based annealing hardware for large-scale combinatorial optimization under strict energy and resource constraints.
Paper Structure (19 sections, 9 equations, 12 figures, 7 tables)

This paper contains 19 sections, 9 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Variation of p-bit-based simulated annealing. Stochastic simulated annealing (SSA) SSQA is a p-bit-based simulated annealing (SA) method approximated using stochastic computing. Stochastic simulated quantum annealing (SSQA) SSQA is an alternative p-bit-based SA approach that utilizes replicas of a spin network to mimic quantum annealing on a classical computer.
  • Figure 2: An SSQA spin network consisting of $N \times R$ p-bits, where each replica consists of $N$ p-bits. Each replica of the spin network is based on the Ising model, and adjacent layers are connected through interaction coefficients $Q$.
  • Figure 3: Evolution of the interaction constant $Q(t)$ over the annealing steps in the SSQA method, illustrating how the coupling strength between replicas is gradually increased. The optimization process is guided by this schedule: low $Q(t)$ values allow independent exploration within each replica, while higher $Q(t)$ values enhance inter-replica coupling, encouraging convergence toward the global minimum of the energy function $H$. This mechanism enables efficient solution search via quantum-inspired tunneling behavior.
  • Figure 4: Spin-serial and replica-parallel architecture of the proposed SSQA hardware. The design comprises $R$ spin gate circuits that are reused $N$ times to compute a total of $R \times N$ spins. The spin-serial structure reduces wiring complexity, while the replica-parallel structure allows concurrent access to shared weights, enabling efficient and scalable computation for fully connected graphs.
  • Figure 5: Spin-serial spin gate circuit implementing the update rule in Eq. (6). At each clock cycle, the output of one spin and the corresponding weight $J_{ij}$ are sequentially read from BRAM to compute spin interactions. This time-multiplexed design allows hardware resource reduction while supporting arbitrary spin connectivity.
  • ...and 7 more figures