Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach

Ruihong Yin; Yue Zheng; Chaohui Li; Ahmet Efe; Abhimanyu Kumar; Ziqing Zeng; Ulya R. Karpuzcu; Sachin S. Sapatnekar; Chris H. Kim

Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach

Ruihong Yin, Yue Zheng, Chaohui Li, Ahmet Efe, Abhimanyu Kumar, Ziqing Zeng, Ulya R. Karpuzcu, Sachin S. Sapatnekar, Chris H. Kim

TL;DR

This work tackles the slow, CPU-bound decomposition bottleneck that prevents large-scale Ising solvers from operating at peak speed. By co-locating an FPGA-based decomposition engine with a 50-spin COBI Ising chip and employing a CSR-based memory model, dual-level parallelism, and a pipelined flow, the system dramatically reduces subproblem transfer latency and keeps the analog solver busy. The approach yields a 1.93× geomean speedup (up to 2×) and over 150× energy efficiency improvements versus optimized CPU baselines, with scalable predictions for wider memory interfaces and more extensive FPGA resources. This hardware-software co-design provides a generalizable framework for transforming large NP problems mapped to SAT/Ising forms into practical, high-throughput mixed hardware accelerators applicable to 3SAT, MaxCut, bin packing, and related domains.

Abstract

Emerging analog computing substrates, such as oscillator-based Ising machines, offer rapid convergence times for combinatorial optimization but often suffer from limited scalability due to physical implementation constraints. To tackle real-world problems involving thousands of variables, problem decomposition is required; however, performing this step on standard CPUs introduces significant latency, preventing the high-speed solver from operating at full capacity. This work presents a heterogeneous system that offloads the decomposition workload to an FPGA, tightly integrated with a custom 28nm Ising solver. By migrating the decomposition logic to reconfigurable hardware and utilizing parallel processing elements, the system minimizes the communication latency typically associated with host-device interactions. Our evaluation demonstrates that this co-design approach effectively bridges the speed gap between digital preprocessing and analog solving, achieving nearly 2$\times$ speedup and an energy efficiency improvement of over two orders of magnitude compared to optimized software baselines running on modern CPUs.

Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach

TL;DR

Abstract

speedup and an energy efficiency improvement of over two orders of magnitude compared to optimized software baselines running on modern CPUs.

Paper Structure (41 sections, 3 equations, 9 figures, 5 tables)

This paper contains 41 sections, 3 equations, 9 figures, 5 tables.

Introduction
Ising Machines for Combinatorial Optimization
The Coupled Oscillator Based Ising (COBI) Chip: A 50-Spin RO-Based Ising Solver
The Scalability Challenge: Decomposition is Essential
The Decomposition Bottleneck
Our Approach: FPGA-Accelerated Decomposition
Contributions
Background and Related Work
SAT, QUBO, and Ising Formulation
Decomposition Strategies
Hardware Implementations for Ising Machines
Problem Formulation and Decomposition Flow
Chancellor Construction for 3SAT
BFS-Based Decomposition Strategy
Variable Selection via BFS
...and 26 more sections

Figures (9)

Figure 1: (a) FPGA setup showing the FPGA decomposition accelerator (Xilinx Artix-7 XC7A35T) integrated on the same board with the Ising core. This on-board co-location eliminates PCIe communication overhead and enables microsecond-scale subproblem transfer. (b) Die photograph showing five COBI chips, each supporting 50 fully-connected spins. (c) Hardware implementation: an all-to-all array of CMOS ring oscillators (ROs) designed to solve the Ising problem.
Figure 2: Communication overhead analysis for CPU-based decomposition. PCIe data transfer latency dominates the decomposition pipeline, creating a bottleneck that prevents efficient utilization of the fast COBI Ising solver. This motivates co-locating decomposition logic with the Ising core using lightweight interfaces.
Figure 3: Architectural comparison between the conventional software-based decomposition approach and the proposed FPGA-accelerated framework. (Top) The baseline system suffers from high-latency iterative data transfers over the PCIe bus. (Middle) The proposed architecture offloads the decomposition logic to RTL, enabling a one-time global problem transfer and low-latency local interconnects. (Bottom) The corresponding algorithmic data flow illustrating the transformation from the large-scale global problem ($N>1000$) to local sub-problems.
Figure 4: Convergence iteration comparison across different decomposition strategies on 3SAT benchmarks. BFS achieves median convergence in approximately 50 iterations, which is significantly faster than energy-impact-based methods (approximately 200 iterations) and comparable to or better than alternative approaches.
Figure 5: Top-level architecture diagram showing detailed component interactions and data flow paths within the hybrid FPGA-Ising system.
...and 4 more figures

Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach

TL;DR

Abstract

Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (9)