Table of Contents
Fetching ...

Noise-Adaptive Quantum Circuit Mapping for Multi-Chip NISQ Systems via Deep Reinforcement Learning

Atiye Zeynali, Zahra Bakhshi

TL;DR

DeepQMap reframes quantum circuit mapping for multi‑chip NISQ systems as a noise‑aware sequential decision problem, using a Bidirectional LSTM Dynamic Noise Adaptation (DNA) network to forecast short‑term hardware noise and a multi‑head attention module to capture long‑range qubit dependencies. The Rainbow DQN framework integrates prioritized replay, dueling networks, and multi‑step returns to learn robust, scalable mapping policies that minimize inter‑chip communication while maximizing fidelity. Empirical evaluation on 270 benchmarks spanning QFT, Grover, and VQE demonstrates a 49.3% fidelity improvement over QUBO baselines, a 79.8% reduction in inter‑chip operations, and an 8.2× faster training time, with sustained performance up to 100 qubits. The approach generalizes across circuit families and hardware topologies, offering practical, scalable improvements for near‑term quantum computing. These results indicate that predictive noise modeling combined with structured RL can significantly enhance real‑world quantum compilation and control.

Abstract

The transition from monolithic to distributed multi-chip quantum architectures has fundamentally altered the circuit compilation landscape, introducing challenges in managing temporal noise variations and minimizing expensive inter-chip operations. We present DeepQMap, a deep reinforcement learning framework that integrates a bidirectional Long Short-Term Memory based Dynamic Noise Adaptation (DNA) network with multi-head attention mechanisms and Rainbow DQN architecture. Unlike conventional static optimization approaches such as QUBO formulations, our method continuously adapts to hardware dynamics through learned temporal representations of quantum system behavior. Comprehensive evaluation across 270 benchmark circuits spanning Quantum Fourier Transform, Grover's algorithm, and Variational Quantum Eigensolver demonstrates that DeepQMap achieves mean circuit fidelity of $0.920 \pm 0.023$, representing a statistically significant 49.3\% improvement over state-of-the-art QUBO methods ($0.618 \pm 0.031$, $t_{98} = 4.87$, $p = 0.0023$, Cohen's $d = 2.34$). Inter-chip communication overhead reduces by 79.8\%, decreasing from 2.34 operations per circuit to 0.47. The DNA network maintains noise prediction accuracy with coefficient of determination $R^2 = 0.912$ and mean absolute error of 0.87\%, enabling proactive compensation for hardware fluctuations. Scalability analysis confirms sustained performance across 20-100 qubit systems, with fidelity remaining above 0.87 even at maximum scale where competing methods degrade below 0.60. Training convergence occurs 8.2$\times$ faster than baseline approaches, completing in 45 minutes versus 370 minutes for QUBO optimization. Very large effect sizes validate practical significance for near-term noisy intermediate-scale quantum computing applications.

Noise-Adaptive Quantum Circuit Mapping for Multi-Chip NISQ Systems via Deep Reinforcement Learning

TL;DR

DeepQMap reframes quantum circuit mapping for multi‑chip NISQ systems as a noise‑aware sequential decision problem, using a Bidirectional LSTM Dynamic Noise Adaptation (DNA) network to forecast short‑term hardware noise and a multi‑head attention module to capture long‑range qubit dependencies. The Rainbow DQN framework integrates prioritized replay, dueling networks, and multi‑step returns to learn robust, scalable mapping policies that minimize inter‑chip communication while maximizing fidelity. Empirical evaluation on 270 benchmarks spanning QFT, Grover, and VQE demonstrates a 49.3% fidelity improvement over QUBO baselines, a 79.8% reduction in inter‑chip operations, and an 8.2× faster training time, with sustained performance up to 100 qubits. The approach generalizes across circuit families and hardware topologies, offering practical, scalable improvements for near‑term quantum computing. These results indicate that predictive noise modeling combined with structured RL can significantly enhance real‑world quantum compilation and control.

Abstract

The transition from monolithic to distributed multi-chip quantum architectures has fundamentally altered the circuit compilation landscape, introducing challenges in managing temporal noise variations and minimizing expensive inter-chip operations. We present DeepQMap, a deep reinforcement learning framework that integrates a bidirectional Long Short-Term Memory based Dynamic Noise Adaptation (DNA) network with multi-head attention mechanisms and Rainbow DQN architecture. Unlike conventional static optimization approaches such as QUBO formulations, our method continuously adapts to hardware dynamics through learned temporal representations of quantum system behavior. Comprehensive evaluation across 270 benchmark circuits spanning Quantum Fourier Transform, Grover's algorithm, and Variational Quantum Eigensolver demonstrates that DeepQMap achieves mean circuit fidelity of , representing a statistically significant 49.3\% improvement over state-of-the-art QUBO methods (, , , Cohen's ). Inter-chip communication overhead reduces by 79.8\%, decreasing from 2.34 operations per circuit to 0.47. The DNA network maintains noise prediction accuracy with coefficient of determination and mean absolute error of 0.87\%, enabling proactive compensation for hardware fluctuations. Scalability analysis confirms sustained performance across 20-100 qubit systems, with fidelity remaining above 0.87 even at maximum scale where competing methods degrade below 0.60. Training convergence occurs 8.2 faster than baseline approaches, completing in 45 minutes versus 370 minutes for QUBO optimization. Very large effect sizes validate practical significance for near-term noisy intermediate-scale quantum computing applications.

Paper Structure

This paper contains 25 sections, 21 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: DeepQMap multi-chip quantum system architecture. The framework orchestrates qubit placement across 4-6 quantum chips (orange boxes) arranged in ring, grid, or hexagonal topologies. The DeepQMap Agent (green components) comprises four integrated modules: Rainbow DQN for mapping decisions, DNA Network (LSTM) for noise prediction, Multi-Head Attention for qubit interaction modeling, and Prioritized Replay Buffer for experience management. Red dashed arrows indicate real-time feedback from hardware (noise telemetry, qubit states) to the agent. Green solid arrows represent optimized control signals (physical mappings, inter-chip routing, noise mitigation strategies) from agent to hardware. The Logical Quantum Circuit (purple) defines the input computational task, while dual optimization goals (red ellipses) balance fidelity maximization against operation minimization. This bidirectional information flow enables dynamic adaptation to temporal hardware variations unavailable in static compilation approaches.
  • Figure 2: Comprehensive training convergence analysis over 500 episodes. (a) Fidelity progression demonstrates three distinct learning phases: exploration (0-165) with high variance as agent tests diverse strategies, learning (166-335) with steady improvement and variance reduction, and exploitation (336-500) with stable convergence to 0.915. Shaded region shows 95% confidence intervals. QUBO baseline (dashed blue) is surpassed by episode 180. (b) Phase-wise statistics confirm monotonic improvement from 0.655 to 0.872, validating learning effectiveness. (c) Inter-chip operations decrease dramatically from 4.2 to 0.30, with green shading highlighting improvement zone below QUBO average of 2.34. This 93% reduction directly translates to fidelity gains. (d) Epsilon decay schedule transitions from pure exploration ($\epsilon = 1.0$) to refined exploitation ($\epsilon = 0.01$) by episode 400, balancing discovery of novel strategies with refinement of successful patterns. (e) DNA network achieves close tracking of actual noise (orange) with predictions (purple) over 200 episodes, maintaining $R^2 = 0.912$ and MAE = 0.87%. Accurate forecasting enables proactive adaptation rather than reactive compensation.
  • Figure 3: DNA network prediction performance and feature analysis. (a) Scatter plot of predicted versus actual noise levels demonstrates strong linear correlation ($R^2 = 0.912$). Points cluster tightly around identity line (dashed), with minimal systematic bias. Color gradient indicates prediction horizon (1-10 timesteps), showing graceful accuracy degradation with longer forecasts. One-step predictions achieve $R^2 = 0.943$, while ten-step maintain $R^2 = 0.852$, both exceeding baseline methods. (b) Feature importance analysis via gradient attribution reveals temporal history dominates (35.0% ± 3.0%), confirming strong autocorrelation in noise dynamics. Current noise level (25.0%) and thermal readings (20.0%) provide real-time state information. Hardware topology (12%) captures spatial dependencies between chips. Circuit-specific features contribute minimally (depth 5%, gate fidelity 2%, quantum volume 1%), indicating noise primarily reflects hardware rather than computational workload.
  • Figure 4: Comprehensive multi-dimensional performance comparison. (a) Normalized metrics (scaled to [0,1]) show DeepQMap achieves scores above 0.90 for fidelity, operations, error rate, depth, and convergence, with only training time showing competitive parity. QUBO performs poorly across most dimensions despite faster inference. (b) Heatmap quantifies percentage improvements: +49.3% fidelity, -79.8% operations, -66.1% error rate, -30.2% depth versus QUBO. Color intensity (red = large improvement) highlights multi-objective optimization success. (c) Time-series tracking over 100 evaluation checkpoints demonstrates DeepQMap's consistent superiority and lower variance compared to volatile QUBO performance. (d) Box plots reveal DeepQMap's narrow distribution (IQR = 0.018) versus QUBO's wide spread (IQR = 0.047), confirming reliable performance across diverse circuits. (e) Statistical tests show $t_{98} = 4.87$, $p = 0.0023$ versus QUBO, and $t_{98} = 7.34$, $p < 0.0001$ versus trivial, both highly significant at $\alpha = 0.01$. (f) Radar chart synthesizes six performance dimensions, with DeepQMap dominating all axes: fidelity (0.92), speed (0.85), scalability (0.90), noise adaptation (0.95), memory efficiency (0.88), convergence (0.93).
  • Figure 5: Training dynamics and convergence characteristics. (a) Reward distribution shifts dramatically from early exploration (blue, mean $-387$, std $523$) to late exploitation (orange, mean $+1243$, std $89$). Histogram evolution shows increasing concentration around positive rewards, with 95th percentile improving from $-12$ to $+1456$. The 421% reward improvement and 83% variance reduction indicate policy convergence to consistent high-quality solutions. (b) Training loss decreases exponentially from $2.74 \times 10^{-3}$ (episode 50) to $4.31 \times 10^{-4}$ (episode 500), following characteristic stable RL learning curve. Smooth decline without oscillations confirms appropriate hyperparameter selection and effective gradient clipping. Joint DNA-RL training (green) shows slightly higher but more stable loss than RL-only (blue), validating auxiliary supervision benefits.
  • ...and 2 more figures