Standalone FPGA-Based QAOA Emulator for Weighted-MaxCut on Embedded Devices
Seonghyun Choi, Kyeongwon Lee, Jae-Jin Lee, Woojoo Lee
TL;DR
The paper tackles the challenge of running quantum-inspired optimization on resource-constrained edge devices by presenting a standalone FPGA-based emulator for QAOA applied to Weighted-MaxCut. It introduces hardware-aware optimizations that decompose the mixer and cost unitaries into diagonal and Hadamard components, enabling an $O(N)$ pipeline-based emulation where $N=2^n$ is the state-space size. A dedicated Quantum MaxCut Accelerator (QMA) is integrated into a RISC-V SoC and augmented by the QC Emulator eXpress (QEX) tool for automatic RTL generation, delivering substantial energy and time savings (up to $2{,}182\times$ and $852\times$ respectively for 9 qubits) while scaling to 8–9 qubits on mid-/low-end FPGAs. These results demonstrate practical QC emulation on embedded hardware, offering a concrete pathway toward edge-deployed QC-like optimization and benchmark-ready architectures for future mobile and IoT applications.
Abstract
Quantum computing QC emulation is crucial for advancing QC applications, especially given the scalability constraints of current devices. FPGA-based designs offer an efficient and scalable alternative to traditional large-scale platforms, but most are tightly integrated with high-performance systems, limiting their use in mobile and edge environments. This study introduces a compact, standalone FPGA-based QC emulator designed for embedded systems, leveraging the Quantum Approximate Optimization Algorithm (QAOA) to solve the Weighted-MaxCut problem. By restructuring QAOA operations for hardware compatibility, the proposed design reduces time complexity from O(N^2) to O(N), where N equals 2^n for n qubits. This reduction, coupled with a pipeline architecture, significantly minimizes resource consumption, enabling support for up to nine qubits on mid-tier FPGAs, roughly three times more than comparable designs. Additionally, the emulator achieved energy savings ranging from 1.53 times for two-qubit configurations to up to 852 times for nine-qubit configurations, compared to software-based QAOA on embedded processors. These results highlight the practical scalability and resource efficiency of the proposed design, providing a robust foundation for QC emulation in resource-constrained edge devices.
