QPU Micro-Kernels for Stencil Computation
Stefano Markidis, Luca Pennati, Marco Pasquale, Gilbert Netzer, Ivy Peng
TL;DR
This work proposes QPU micro-kernels as shallow, per-node quantum subroutines that perform explicit stencil updates via Monte Carlo sampling, allowing a classical time loop to orchestrate grid-wide PDE solvers with a fixed qubit footprint that is independent of grid size. It develops two realizations—Bernoulli and branching—along with variants for signed coefficients, higher dimensions, and stochastic forcing, and demonstrates their applicability to the 1D Heat and viscous Burgers’ equations. The authors analyze spatial sampling errors, propose batching and in-circuit fusion to reduce overheads, and validate the approach on noiseless simulators and the IBM Brisbane quantum device, observing favorable accuracy for Bernoulli over branching under current hardware constraints. While not delivering quantum advantage in a single micro-kernel, the framework offers scalable parallelism, potential variance-reduction opportunities (e.g., IAE), and a practical path toward quantum-assisted sampling in explicit stencil solvers within heterogeneous HPC workflows.
Abstract
We introduce QPU micro-kernels: shallow quantum circuits that perform a stencil node update and return a Monte Carlo estimate from repeated measurements. We show how to use them to solve Partial Differential Equations (PDEs) explicitly discretized on a computational stencil. From this point of view, the QPU serves as a sampling accelerator. Each micro-kernel consumes only stencil inputs (neighbor values and coefficients), runs a shallow parameterized circuit, and reports the sample mean of a readout rule. The resource footprint in qubits and depth is fixed and independent of the global grid. This makes micro-kernels easy to orchestrate from a classical host and to parallelize across grid points. We present two realizations. The Bernoulli micro-kernel targets convex-sum stencils by encoding values as single-qubit probabilities with shot allocation proportional to stencil weights. The branching micro-kernel prepares a selector over stencil branches and applies addressed rotations to a single readout qubit. In contrast to monolithic quantum PDE solvers that encode the full space-time problem in one deep circuit, our approach keeps the classical time loop and offloads only local updates. Batching and in-circuit fusion amortize submission and readout overheads. We test and validate the QPU micro-kernel method on two PDEs commonly arising in scientific computing: the Heat and viscous Burgers' equations. On noiseless quantum circuit simulators, accuracy improves as the number of samples increases. On the IBM Brisbane quantum computer, single-step diffusion tests show lower errors for the Bernoulli realization than for branching at equal shot budgets, with QPU micro-kernel execution dominating the wall time.
