Decomposing Large-Scale Ising Problems on FPGAs: A Hybrid Hardware Approach
Ruihong Yin, Yue Zheng, Chaohui Li, Ahmet Efe, Abhimanyu Kumar, Ziqing Zeng, Ulya R. Karpuzcu, Sachin S. Sapatnekar, Chris H. Kim
TL;DR
This work tackles the slow, CPU-bound decomposition bottleneck that prevents large-scale Ising solvers from operating at peak speed. By co-locating an FPGA-based decomposition engine with a 50-spin COBI Ising chip and employing a CSR-based memory model, dual-level parallelism, and a pipelined flow, the system dramatically reduces subproblem transfer latency and keeps the analog solver busy. The approach yields a 1.93× geomean speedup (up to 2×) and over 150× energy efficiency improvements versus optimized CPU baselines, with scalable predictions for wider memory interfaces and more extensive FPGA resources. This hardware-software co-design provides a generalizable framework for transforming large NP problems mapped to SAT/Ising forms into practical, high-throughput mixed hardware accelerators applicable to 3SAT, MaxCut, bin packing, and related domains.
Abstract
Emerging analog computing substrates, such as oscillator-based Ising machines, offer rapid convergence times for combinatorial optimization but often suffer from limited scalability due to physical implementation constraints. To tackle real-world problems involving thousands of variables, problem decomposition is required; however, performing this step on standard CPUs introduces significant latency, preventing the high-speed solver from operating at full capacity. This work presents a heterogeneous system that offloads the decomposition workload to an FPGA, tightly integrated with a custom 28nm Ising solver. By migrating the decomposition logic to reconfigurable hardware and utilizing parallel processing elements, the system minimizes the communication latency typically associated with host-device interactions. Our evaluation demonstrates that this co-design approach effectively bridges the speed gap between digital preprocessing and analog solving, achieving nearly 2$\times$ speedup and an energy efficiency improvement of over two orders of magnitude compared to optimized software baselines running on modern CPUs.
