Table of Contents
Fetching ...

A Unified Performance-Cost Landscape of Parallel p-bit Ising Machines Based on Update Dynamics

Naoya Onizawa, Takahiro Hanyu

Abstract

Parallel p-bit Ising machines are a promising platform for fast and energy-efficient combinatorial optimization, but their scalability depends on update synchronization, hardware delay, and architectural cost. In this work, we establish a unified performance-cost framework by analyzing synchronous and asynchronous update schemes under realistic constraints, including finite delay, time-multiplexed p-bit reuse, and limited DAC precision. We show that synchronous updates are not inherently unstable but can exhibit oscillations under excessive simultaneity, while asynchronous updates require slower operation due to hardware delay. To address this trade-off, we introduce time-multiplexed p-bit reuse with structured synchronous control, preserving correct annealing dynamics while reducing hardware requirements. This approach decouples statistical correctness from physical resources, enabling the number of p-bits and DACs to scale inversely with the reuse factor. As a result, synchronous architectures achieve comparable or better solution quality at less than half the hardware cost of optimized asynchronous designs on G-set MaxCut benchmarks (800-2000 nodes). We also show that low-resolution DACs (3-4 bits) are sufficient to reach near-optimal solutions when annealing time is properly adjusted. These findings provide practical design guidelines for scalable probabilistic computing hardware under realistic constraints.

A Unified Performance-Cost Landscape of Parallel p-bit Ising Machines Based on Update Dynamics

Abstract

Parallel p-bit Ising machines are a promising platform for fast and energy-efficient combinatorial optimization, but their scalability depends on update synchronization, hardware delay, and architectural cost. In this work, we establish a unified performance-cost framework by analyzing synchronous and asynchronous update schemes under realistic constraints, including finite delay, time-multiplexed p-bit reuse, and limited DAC precision. We show that synchronous updates are not inherently unstable but can exhibit oscillations under excessive simultaneity, while asynchronous updates require slower operation due to hardware delay. To address this trade-off, we introduce time-multiplexed p-bit reuse with structured synchronous control, preserving correct annealing dynamics while reducing hardware requirements. This approach decouples statistical correctness from physical resources, enabling the number of p-bits and DACs to scale inversely with the reuse factor. As a result, synchronous architectures achieve comparable or better solution quality at less than half the hardware cost of optimized asynchronous designs on G-set MaxCut benchmarks (800-2000 nodes). We also show that low-resolution DACs (3-4 bits) are sufficient to reach near-optimal solutions when annealing time is properly adjusted. These findings provide practical design guidelines for scalable probabilistic computing hardware under realistic constraints.

Paper Structure

This paper contains 5 sections, 11 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Unified performance--cost landscape using graph-averaged performance at a fixed simulation time of 500 ns, reorganized into a 2$\times$2 layout for readability. The panels show all policies, Async only, structured synchronous schedules, and random synchronous schedules, respectively. The horizontal axis shows normalized hardware cost ($C_{\mathrm{HW}}/N$) accounting for p-bit count and DAC resolution, and the vertical axis shows the mean normalized cut value averaged over all G-set instances for each parameter setting. Points are binned by hardware cost (40 bins); filled markers indicate the median and open markers the maximum within each bin, and a single global legend is used across all panels.
  • Figure 2: Conceptual illustration of time-multiplexed p-bit reuse. (a) Without time-multiplexing ($c=1$), each logical p-bit maps to a dedicated physical p-bit. (b) With time-multiplexing ($c>1$), a physical p-bit sequentially emulates multiple logical p-bits across time slots, reducing physical resources while increasing the effective update interval.
  • Figure 3: Oscillation and stabilization in synchronous random updates. Energy time series for six representative G-set instances (G1, G6, G11, G34, G38, and G39) under synchronous tick-random updates. Each panel compares different time-multiplexing reuse factors $c \in \{1, 1.25, 1.5, 2, 3\}$, where larger $c$ corresponds to stronger time-multiplexed reuse and a lower effective per-spin update rate. For $c=1$, many spins are updated simultaneously at each tick, leading to coherent collective switching and pronounced oscillations in the energy trajectory. As $c$ increases, update simultaneity is reduced, oscillations are suppressed, and the annealing dynamics become progressively more stable across all tested instances. A single legend is shown once for the multi-panel figure to avoid repeating the same legend in each panel.
  • Figure 4: Sensitivity of asynchronous (Gillespie-type) updates to hardware delay. The normalized mean cut value is plotted as a function of the delay-to-update ratio $d/\tau$ for six G-set instances, with the apply delay fixed at $d = 5$ ns and a simulation time of 100 ns. Line color indicates the input-DAC resolution, the horizontal axis is labeled explicitly as the delay-to-update ratio $d/\tau$, and a single global legend is used for the full grid. For small $d/\tau$, asynchronous updates exhibit stable annealing behavior. As $d/\tau$ approaches unity, performance degrades due to spins acting on stale local fields computed from outdated neighboring states. This effect is exacerbated at low DAC resolution, demonstrating that asynchronous architectures are strongly constrained by the relationship between device latency and update interval.
  • Figure 5: Comparison of synchronous update policies. Left: the DAC bit width $b$ that maximizes the mean normalized cut value for each reuse factor $c$ and update policy. Right: the corresponding mean normalized cut value at that optimal bit width, with error bars indicating the min–max range across G-set instances. All results use a simulation time of 500 ns. Results are shown for tick-random, block-random, and block-random-stride scheduling policies; block-based schedules attain similar or higher performance with comparable or lower bit width than fully random updates.
  • ...and 3 more figures