Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing

Mostafa Atallah; Rebekah Herrman

Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing

Mostafa Atallah, Rebekah Herrman

TL;DR

An iterative framework based on Quadratic Unconstrained Binary Optimization (QUBO) for training the classifier head of convolutional neural networks (CNNs) via quantum annealing, entirely avoiding gradient-based circuit optimization.

Abstract

Variational quantum circuits for image classification suffer from barren plateaus, while quantum kernel methods scale quadratically with dataset size. We propose an iterative framework based on Quadratic Unconstrained Binary Optimization (QUBO) for training the classifier head of convolutional neural networks (CNNs) via quantum annealing, entirely avoiding gradient-based circuit optimization. Following the Extreme Learning Machine paradigm, convolutional filters are randomly initialized and frozen, and only the fully connected layer is optimized. At each iteration, a convex quadratic surrogate derived from the feature Gram matrix replaces the non-quadratic cross-entropy loss, yielding an iteration-stable curvature proxy. A per-output decomposition splits the $C$-class problem into $C$ independent QUBOs, each with $(d+1)K$ binary variables, where $d$ is the feature dimension and $K$ is the bit precision, so that problem size depends on the image resolution and bit precision, not on the number of training samples. We evaluate the method on six image-classification benchmarks (sklearn digits, MNIST, Fashion-MNIST, CIFAR-10, EMNIST, KMNIST). A precision study shows that accuracy improves monotonically with bit resolution, with 10 bits representing a practical minimum for effective optimization; the 15-bit formulation remains within the qubit and coupler limits of current D-Wave Advantage hardware. The 20-bit formulation matches or exceeds classical stochastic gradient descent on MNIST, Fashion-MNIST, and EMNIST, while remaining competitive on CIFAR-10 and KMNIST. All experiments use simulated annealing, establishing a baseline for direct deployment on quantum annealing hardware.

Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing

TL;DR

Abstract

-class problem into

independent QUBOs, each with

binary variables, where

is the feature dimension and

is the bit precision, so that problem size depends on the image resolution and bit precision, not on the number of training samples. We evaluate the method on six image-classification benchmarks (sklearn digits, MNIST, Fashion-MNIST, CIFAR-10, EMNIST, KMNIST). A precision study shows that accuracy improves monotonically with bit resolution, with 10 bits representing a practical minimum for effective optimization; the 15-bit formulation remains within the qubit and coupler limits of current D-Wave Advantage hardware. The 20-bit formulation matches or exceeds classical stochastic gradient descent on MNIST, Fashion-MNIST, and EMNIST, while remaining competitive on CIFAR-10 and KMNIST. All experiments use simulated annealing, establishing a baseline for direct deployment on quantum annealing hardware.

Paper Structure (16 sections, 18 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 18 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Related Work
QUBO Formulation for CNN Training
Quadratic Surrogate for Iterative Updates
The Gram Matrix as Curvature Proxy
Gradient from Softmax Residuals
Binary Encoding
QUBO Matrix Construction
Layer-Wise and Per-Class Decomposition
Problem Setup
Per-Output Decomposition
Algorithm
Experimental Results
sklearn Digits Dataset
Multi-Dataset Benchmark
...and 1 more sections

Figures (4)

Figure 1: Training pipeline matching Algorithm \ref{['alg:layerwise_gram']}. Green blocks: one-time initialization (frozen CNN features and Gram matrix). Gray blocks: iterative phases 1 and 3. Cyan blocks: Phase 2 inner loop over $C$ classes. Outer loop repeats $T$ iterations.
Figure 2: CNN architecture used in all experiments. Convolutional layers are randomly initialized and frozen; only the fully connected (FC) classifier head is trained via iterative QUBO solves.
Figure 3: Convergence curves for Classical FC (SGD) and QUBO methods with different bit precisions over 1000 iterations.
Figure 4: Sample predictions from Classical FC (SGD) and QUBO models with different bit precisions. Each row shows predictions for one test digit, with columns showing probability distributions from each model.

Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing

TL;DR

Abstract

Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)