CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Hoang M. Ngo; Nguyen H K. Do; Minh N. Vu; Tre' R. Jeter; Tamer Kahveci; My T. Thai

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Hoang M. Ngo, Nguyen H K. Do, Minh N. Vu, Tre' R. Jeter, Tamer Kahveci, My T. Thai

TL;DR

This work tackles the NP-hard minor embedding problem critical to quantum annealing performance. It introduces CHARME, a chain-based reinforcement learning framework that uses aGCN policy, a state-transition procedure guaranteeing feasible embeddings, and an Order Exploration strategy to learn effective embedding orders. Empirical results show CHARME achieves lower qubit usage than fast baselines like Minorminer and ATOM, while maintaining competitive runtimes and outperforming OCT in several settings, especially for sparse graphs. The approach demonstrates practical potential for scalable quantum optimization by improving embedding efficiency and training dynamics, with robust performance on both synthetic and real-world QUBO-derived graphs.

Abstract

Quantum annealing (QA) has great potential to solve combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms is heavily based on the embedding of problem instances, represented as logical graphs, into the quantum processing unit (QPU) whose topology is in the form of a limited connectivity graph, known as the minor embedding problem. Because the minor embedding problem is an NP-hard problem~\mbox{\cite{Goodrich2018}}, existing methods for the minor embedding problem suffer from scalability issues when faced with larger problem sizes. In this paper, we propose a novel approach utilizing Reinforcement Learning (RL) techniques to address the minor embedding problem, named CHARME. CHARME includes three key components: a Graph Neural Network (GNN) architecture for policy modeling, a state transition algorithm that ensures solution validity, and an order exploration strategy for effective training. Through comprehensive experiments on synthetic and real-world instances, we demonstrate the efficiency of our proposed order exploration strategy as well as our proposed RL framework, CHARME. In particular, CHARME yields superior solutions in terms of qubit usage compared to fast embedding methods such as Minorminer and ATOM. Moreover, our method surpasses the OCT-based approach, known for its slower runtime but high-quality solutions, in several cases. In addition, our proposed exploration enhances the efficiency of the training of the CHARME framework by providing better solutions compared to the greedy strategy.

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

TL;DR

Abstract

Paper Structure (21 sections, 1 theorem, 17 equations, 11 figures, 3 tables, 3 algorithms)

This paper contains 21 sections, 1 theorem, 17 equations, 11 figures, 3 tables, 3 algorithms.

Introduction
Preliminaries
Minor Embedding Problem
Heuristic approaches for the minor embedding problem
Reinforcement Learning
Proposed Solutions
RL Framework
NaiveRL - An Initial RL Framework
CHARME: A Chain-Based RL Solution for the Minor Embedding Problem
Graph Representation and the Structure of the Policy
State Transition Algorithm
Exploration Strategy
Overview
Order Refining
Experiments
...and 6 more sections

Key Result

Theorem 1

Given a hardware graph $H = (V_H, E_H)$, a logical graph $P = (V_P,E_P)$, and the step limit $T = |V_P|$, assume that at the embedding step $t < T$, we have the sequence of selected actions $O_A = (\Bar{a}_1, \dots, \Bar{a}_{t-1})$ as a non-expansion prefix. Given a node $v\in V_P, v\notin O_A$, the

Figures (11)

Figure 1: A high-level overview of the Minor Embedding problem where a QUBO formulation presented by a logical graph is embedded into a hardware graph.
Figure 2: Workflow of the CHARME framework. The detailed architecture of the actor and critic is described in the subsequent part.
Figure 3: A GCN-based architecture of the models (actor and critic) presenting the policy of the RL-Agent
Figure 4: The figure highlights the performance of Order Exploration and Greedy Exploration in discovering embedding orders for training graphs in synthetic training sets under various settings of $(n,d)$. In subfigures, the x-axis denotes the number of exploration steps, while the y-axis represents the corresponding efficiency scores of the resulting orders. As a result, in these subfigures, each orange (blue) point indicates the best efficiency score of the embedding order resulted by Order Exploration (Greedy Exploration) after a specific number of exploration steps.
Figure 5: The figure highlights the performance of Order Exploration and Greedy Exploration in discovering embedding orders for training graphs in the real training sets under various graph types including low density, medium density, and high density. In subfigures, the x-axis denotes the number of exploration steps, while the y-axis represents the corresponding efficiency scores of the resulting orders. As a result, in these subfigures, each orange (blue) point indicates the best efficiency score of the embedding order resulted by Order Exploration (Greedy Exploration) after a specific number of exploration steps.
...and 6 more figures

Theorems & Definitions (2)

Definition 1
Theorem 1

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

TL;DR

Abstract

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)