Table of Contents
Fetching ...

Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine

Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto

TL;DR

This work presents a highly versatile FPGA-based cyber CIM that implements open-loop CIM, closed-loop CIM, and Jacobi SOR on FP32, enabling dense Ising and QUBO optimization with Zeeman terms for real-world problems. The architecture supports continuous-valued interactions, scales to $N=4096$ on a single FPGA, and uses a unified design to outperform GPU implementations on benchmarks such as CDMA multi-user detection and L0RBCS, with speedups exceeding an order of magnitude in many cases. By combining flexible control sequencing with FP32 arithmetic, the system extends CIM applicability to diverse problem classes, including compression sensing and multi-user detection, and offers potential speedups via clustering or multi-FPGA deployment. The results demonstrate competitive accuracy and substantial runtime advantages, while highlighting practical limitations (e.g., J_MUX wiring delay) and avenues for further parallelism to push scalability.

Abstract

In recent years, quantum Ising machines have drawn a lot of attention, but due to physical implementation constraints, it has been difficult to achieve dense coupling, such as full coupling with sufficient spins to handle practical large-scale applications. Consequently, classically computable equations have been derived from quantum master equations for these quantum Ising machines. Parallel implementations of these algorithms using FPGAs have been used to rapidly find solutions to these problems on a scale that is difficult to achieve in physical systems. We have developed an FPGA implemented cyber coherent Ising machine (cyber CIM) that is much more versatile than previous implementations using FPGAs. Our architecture is versatile since it can be applied to the open-loop CIM, which was proposed when CIM research began, to the closed-loop CIM, which has been used recently, as well as to Jacobi successive over-relaxation method. By modifying the sequence control code for the calculation control module, other algorithms such as Simulated Bifurcation (SB) can also be implemented. Earlier research on large-scale FPGA implementations of SB and CIM used binary or ternary discrete values for connections, whereas the cyber CIM used FP32 values. Also, the cyber CIM utilized Zeeman terms that were represented as FP32, which were not present in other large-scale FPGA systems. Our implementation with continuous interaction realizes N=4096 on a single FPGA, comparable to the single-FPGA implementation of SB with binary interactions, with N=4096. The cyber CIM enables applications such as CDMA multi-user detector and L0 compressed sensing which were not possible with earlier FPGA systems, while enabling superior calculation speeds, more than ten times faster than a GPU implementation. The calculation speed can be further improved by increasing parallelism, such as through clustering.

Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine

TL;DR

This work presents a highly versatile FPGA-based cyber CIM that implements open-loop CIM, closed-loop CIM, and Jacobi SOR on FP32, enabling dense Ising and QUBO optimization with Zeeman terms for real-world problems. The architecture supports continuous-valued interactions, scales to on a single FPGA, and uses a unified design to outperform GPU implementations on benchmarks such as CDMA multi-user detection and L0RBCS, with speedups exceeding an order of magnitude in many cases. By combining flexible control sequencing with FP32 arithmetic, the system extends CIM applicability to diverse problem classes, including compression sensing and multi-user detection, and offers potential speedups via clustering or multi-FPGA deployment. The results demonstrate competitive accuracy and substantial runtime advantages, while highlighting practical limitations (e.g., J_MUX wiring delay) and avenues for further parallelism to push scalability.

Abstract

In recent years, quantum Ising machines have drawn a lot of attention, but due to physical implementation constraints, it has been difficult to achieve dense coupling, such as full coupling with sufficient spins to handle practical large-scale applications. Consequently, classically computable equations have been derived from quantum master equations for these quantum Ising machines. Parallel implementations of these algorithms using FPGAs have been used to rapidly find solutions to these problems on a scale that is difficult to achieve in physical systems. We have developed an FPGA implemented cyber coherent Ising machine (cyber CIM) that is much more versatile than previous implementations using FPGAs. Our architecture is versatile since it can be applied to the open-loop CIM, which was proposed when CIM research began, to the closed-loop CIM, which has been used recently, as well as to Jacobi successive over-relaxation method. By modifying the sequence control code for the calculation control module, other algorithms such as Simulated Bifurcation (SB) can also be implemented. Earlier research on large-scale FPGA implementations of SB and CIM used binary or ternary discrete values for connections, whereas the cyber CIM used FP32 values. Also, the cyber CIM utilized Zeeman terms that were represented as FP32, which were not present in other large-scale FPGA systems. Our implementation with continuous interaction realizes N=4096 on a single FPGA, comparable to the single-FPGA implementation of SB with binary interactions, with N=4096. The cyber CIM enables applications such as CDMA multi-user detector and L0 compressed sensing which were not possible with earlier FPGA systems, while enabling superior calculation speeds, more than ten times faster than a GPU implementation. The calculation speed can be further improved by increasing parallelism, such as through clustering.
Paper Structure (28 sections, 23 equations, 9 figures, 9 tables, 4 algorithms)

This paper contains 28 sections, 23 equations, 9 figures, 9 tables, 4 algorithms.

Figures (9)

  • Figure 1: Relationship among the implemented algorithms. L0-RBCS is realized by alternative execution between open-loop CIM or closed-loop CIM, and Jacobi SOR algorithms. It is also possible to execute only open-loop CIM or closed-loop CIM for various combinatorial optimization problems.
  • Figure 2: Indexes $P_b$, $P_r$, and $P_c$, showing the parallelization format used for matrix-vector multiplication in the local field calculation. (A) Block parallelization index $P_b$ of local field calculation. The local field calculation is partitioned into $P_b$ blocks, and the process in (B) is applied to each block in parallel. (B) Matrix-vector multiplication parallelization indices $P_r$ and $P_c$ of block-partitioned local field calculation. In a parallel MAC operation, a $P_r\times P_c$ matrix and $P_c$-dimensional vector are multiplied. By repeating this $N/P_c$ times, $P_r$ entries of local field have been calculated. This local calculation is repeated $N_b/P_r$ times to have completely calculated all entries of local field. $N_b=N/P_b$.
  • Figure 3: FPGA architecture. (A) Block diagram showing the relationships between individual FPGA modules. An overview of each module is given in Table \ref{['table:FPGA_module_layer']}. (B) CAL$\_$CSR functional diagram. (C) CAL$\_$H functional diagram. (D) Storage scheme for matrix $J$.
  • Figure 4: Parallelization scheme and number of cycles per step. (A) Our architecture ($P_b=1$). (B) FPGA-SB ($P_b>1$).
  • Figure 5: Ratios of cycles-per-step for our architecture and FPGA-SB for $N=2048$ and $N=4096$
  • ...and 4 more figures