A general learning scheme for classical and quantum Ising machines

Ludwig Schmid; Enrico Zardini; Davide Pastorello

A general learning scheme for classical and quantum Ising machines

Ludwig Schmid, Enrico Zardini, Davide Pastorello

TL;DR

This work proposes a general learning scheme that uses the ground-state energy of an Ising model as the predictive output and trains the couplings by gradient descent, with gradients estimated from the Ising machine itself rather than explicit backpropagation. The model F(\boldsymbol{\theta}|\boldsymbol{\Gamma},\lambda,\epsilon)=\lambda E_0(\boldsymbol{\theta},\boldsymbol{\Gamma})+\epsilon is trained by minimizing a mean-squared-error loss and update rules are derived for $\Gamma$, $\lambda$, and $\epsilon$, leveraging the Ising-machine outputs $E_0$ and $\boldsymbol{z}^*$. The approach applies to both classical and quantum Ising machines, with quantum resources used for execution and training, and is demonstrated through proof-of-concept experiments on random data, function approximation, and Bars-and-Stripes classification on a D-Wave system. The results illustrate the feasibility of Ising-machine–driven learning and open questions about expressibility, training dynamics, and practical enhancements for larger-scale problems.

Abstract

An Ising machine is any hardware specifically designed for finding the ground state of the Ising model. Relevant examples are coherent Ising machines and quantum annealers. In this paper, we propose a new machine learning model that is based on the Ising structure and can be efficiently trained using gradient descent. We provide a mathematical characterization of the training process, which is based upon optimizing a loss function whose partial derivatives are not explicitly calculated but estimated by the Ising machine itself. Moreover, we present some experimental results on the training and execution of the proposed learning model. These results point out new possibilities offered by Ising machines for different learning tasks. In particular, in the quantum realm, the quantum resources are used for both the execution and the training of the model, providing a promising perspective in quantum machine learning.

A general learning scheme for classical and quantum Ising machines

TL;DR

, and

, leveraging the Ising-machine outputs

and

. The approach applies to both classical and quantum Ising machines, with quantum resources used for execution and training, and is demonstrated through proof-of-concept experiments on random data, function approximation, and Bars-and-Stripes classification on a D-Wave system. The results illustrate the feasibility of Ising-machine–driven learning and open questions about expressibility, training dynamics, and practical enhancements for larger-scale problems.

Abstract

Paper Structure (15 sections, 1 theorem, 27 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 1 theorem, 27 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Ising machines
The proposed model
Definition
Training process
Hidden spins
Computational cost
Empirical evaluation
Experimental setup
Random data
Function approximation
Bars and stripes
Choice of hyperparameters
Conclusion
Funding information

Key Result

Theorem 3

Let $F$ be the parametric model defined in (eq:model), $\mathcal{D} = \{(\theta^{(a)}, y^{(a)})\}_{a=1,...,N}$ be a training set for $F$, $\mathcal{L}$ be the MSE loss function defined in (eq:loss), and $\eta>0$ be the learning rate. Then, the partial derivatives of $F$ with respect to the couplings where $\Gamma^{(k)}$, $\lambda^{(k)}$, $\epsilon^{(k)}$ are the values of the parameters within the

Figures (8)

Figure 1: Ising model and Ising machine: On the left, an illustration of the graph structure of an Ising model characterized by a fully connected graph, with $\vert V\vert = 5$ spins ${\textbf{z}}$, corresponding biases $\theta$, and couplings $\Gamma$. An Ising machine maps the Ising model to the right-hand side of the figure, returning a $\{-1, +1\}$ assignment (illustrated as white/black nodes) to each binary variable $z_i$. The output is the spin configuration ${\textbf{z}}^\ast$ and the corresponding minimal energy $\mathsf E_0(\theta,\Gamma)$.
Figure 2: Model training: Illustration of the training process for the proposed model. In particular, given a dataset $\mathcal{D} = \{(\theta^{(a)}, y^{(a)})\}_{a=1,...,N}$, an Ising model is instantiated for each sample by setting the biases to $\theta^{(a)}$ and using the couplings $\Gamma$ as free parameters. Then, for each model, an Ising machine is run in order to obtain the spin configuration ${\textbf{z}}^\ast$ and the corresponding model minimal energy $\mathsf E_0$. Finally, the collected values are used to update the couplings $\Gamma$ and the two additional parameters $\lambda$ and $\epsilon$ according to the rules presented in \ref{['thm:params-update-rules']}. This procedure is repeated $N_\mathrm{epochs}$ times until the trained model $F_{\mathrm{model}}(\theta) = F(\theta \vert \Gamma^{N_{\mathrm{epochs}}}, \lambda^{N_{\mathrm{epochs}}}, \epsilon^{N_{\mathrm{epochs}}})$ is returned.
Figure 3: Hidden spins: Two exemplary Ising models with full connectivity. This comparison shows the increase in trainable coupling parameters (graph edges) when the original input $\theta$ is mapped to a higher dimensional space using a preprocessing step $h_\mathrm{pre}$.
Figure 4: MSE loss on random data: Mean squared error, averaged over 30 randomly generated training sets of size $N=20$. The MSE loss is tracked as a function of the number of epochs (with $N_\mathrm{epochs}=50$). The Ising machine in this experiment is the simulated annealing algorithm bundled in the Ocean SDK. The decreasing trend of the loss demonstrates the trainability of the model.
Figure 5: MSE loss in function approximation: Evolution of mean squared error loss during training for linear (a) and quadratic (b) functions. The results achieved by both simulated annealing (SA) and quantum annealing (QA) are shown, with the numeric value following the method name representing the total number of hidden spins $n_\mathrm{total}$. SA and QA perform similarly with equal sizes, with the fluctuations of QA being caused by the very low number of reads ($1$). For $f_\mathrm{lin}$, a larger number of hidden spins corresponds to better performance of QA.
...and 3 more figures

Theorems & Definitions (4)

Definition 1
Definition 2
Theorem 3
proof

A general learning scheme for classical and quantum Ising machines

TL;DR

Abstract

A general learning scheme for classical and quantum Ising machines

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)