Table of Contents
Fetching ...

Limitations in Parallel Ising Machine Networks: Theory and Practice

Matthew X. Burns, Michael C. Huang

TL;DR

This work proposes formal models of parallel IM excution models, then provides theoretical guarantees for probabilistic convergence, and provides practical heuristics for parameter/model selection, informed by theoretical and numerical findings.

Abstract

Analog Ising machines (IMs) occupy an increasingly prominent area of computer architecture research, offering high-quality and low latency/energy solutions to intractable computing tasks. However, IMs have a fixed capacity, with little to no utility in out-of-capacity problems. Previous works have proposed parallel, multi-IM architectures to circumvent this limitation. In this work we theoretically and numerically investigate tradeoffs in parallel IM networks to guide researchers in this burgeoning field. We propose formal models of parallel IM excution models, then provide theoretical guarantees for probabilistic convergence. Numerical experiments illustrate our findings and provide empirical insight into high and low synchronization frequency regimes. We also provide practical heuristics for parameter/model selection, informed by our theoretical and numerical findings.

Limitations in Parallel Ising Machine Networks: Theory and Practice

TL;DR

This work proposes formal models of parallel IM excution models, then provides theoretical guarantees for probabilistic convergence, and provides practical heuristics for parameter/model selection, informed by theoretical and numerical findings.

Abstract

Analog Ising machines (IMs) occupy an increasingly prominent area of computer architecture research, offering high-quality and low latency/energy solutions to intractable computing tasks. However, IMs have a fixed capacity, with little to no utility in out-of-capacity problems. Previous works have proposed parallel, multi-IM architectures to circumvent this limitation. In this work we theoretically and numerically investigate tradeoffs in parallel IM networks to guide researchers in this burgeoning field. We propose formal models of parallel IM excution models, then provide theoretical guarantees for probabilistic convergence. Numerical experiments illustrate our findings and provide empirical insight into high and low synchronization frequency regimes. We also provide practical heuristics for parameter/model selection, informed by our theoretical and numerical findings.

Paper Structure

This paper contains 20 sections, 5 theorems, 63 equations, 11 figures.

Key Result

Theorem 1

Let $\mu_t$ and $\nu_t$ be the probability distributions of the concurrent process and full system processes respectively at time $\tau \geq 0$. Suppose is the accumulated gradient error from process $X$ up to time $\tau$. Then the KL-Divergence between the two measures is

Figures (11)

  • Figure 1: From Ref. sharma_increasing_2022. Estimated speedup degradation as solver capacity (500 spins) is exceeded. The right plot is a zoomed-in version of the left. "Ours" refers to an annealing-based divide-and-conquer scheme tested by the authors, while "D-Wave" refers to the qbSolvbooth_partitioning_nodate framework.
  • Figure 2: Estimated IM speedup versus CPU-based parallel tempering (PT) running on 8 cores on problems from the GSet suite noauthor_index_nodate. Simulated BRIM afoakwa_brim_2021 behavior represents a typical analog Ising machine, while D-Wave Kerberos reference_workflows_dwave represents a typical hybrid platform. All solvers are run to a minimum target of 99% of the best-known-solution.
  • Figure 3: Illustration of the two primary parallel IM execution models. (a) A graphical illustration of a logical partition of the target potential $U$ into $U_{Int}$ and $U_{Ext}$. Here $U(X)$ is depicted as a quadratic function $U(X)=\frac{1}{2}X^TJX$ for some symmetric matrix $J$. The internal interactions comprising $U_{Int}$ have a block diagonal structure (outlined in blue) while the external interactions lie on the off-diagonal elements (highlighted in red). (b) The operation of the two execution models: serial and concurrent. During the annealing phase, serial execution simultaneously optimizes multiple independent replicas $X$, $Y$, $Z$, and $V$, while concurrent operation optimizes a single replica $X$. During a synchronization step, serial execution transfers replicas between chips to conditionally optimize the next subspace in a block Gibbs fashion and concurrent operation globally synchronizes the local spin representations of all subsystems. (c) A bipartite model representing the concurrent model. During an annealing stage, the $X$ subsystems are independent given $X_{0}$. During synchronization, $X_{0}$ is sampled from $X$, and the process begins again in the next epoch.
  • Figure 4: Simulated linear model behavior on graph G1 versus unitless time (a) Concurrent mode cut value versus synchronization epoch $\tau$ for the unweighted graph G1 using 2, 4, and 8 subsystems. Cut value 0 indicates that the system diverged to the degenerate $\pm 1^N$ state. (b) State overlap $\langle X, 1^N\rangle$ versus time for 4 subsystem concurrent execution with a $\tau=0.6$ synchronization epoch.
  • Figure 5: Simulated $W_1$ convergence on a 12-spin Ising lattice for a parallel, concurrent process and an ideal Langevin process. The number of subsystems $B$ for the concurrent process was either 2, 4, or 6, and is denoted by the superscript $\mu^{(B)}_t$. Each solver started from a uniform random distribution. Error bars show the effect of sampling error (primarily visible on the $W_1(\nu_t,\pi)$ curve).
  • ...and 6 more figures

Theorems & Definitions (11)

  • Definition 1: Separable Potential
  • Example 1: Linear Systems
  • Example 2: Kuramoto Model
  • Theorem 1: KL-Divergence of the Approximated Process
  • Theorem 2: Upper Bound (I)
  • Corollary 1
  • Theorem 3: Sufficient Condition for Convergence (II)
  • Corollary 2
  • proof
  • proof
  • ...and 1 more