Table of Contents
Fetching ...

Let the Quantum Creep In: Designing Quantum Neural Network Models by Gradually Swapping Out Classical Components

Peiyong Wang, Casey. R. Myers, Lloyd C. L. Hollenberg, Udaya Parampalli

TL;DR

A framework where classical neural network layers are gradually replaced by quantum layers that have the same type of input and output while keeping the flow of information between layers unchanged is proposed, different from most current research in quantum neural network, which favours an end-to-end quantum model.

Abstract

Artificial Intelligence (AI), with its multiplier effect and wide applications in multiple areas, could potentially be an important application of quantum computing. Since modern AI systems are often built on neural networks, the design of quantum neural networks becomes a key challenge in integrating quantum computing into AI. To provide a more fine-grained characterisation of the impact of quantum components on the performance of neural networks, we propose a framework where classical neural network layers are gradually replaced by quantum layers that have the same type of input and output while keeping the flow of information between layers unchanged, different from most current research in quantum neural network, which favours an end-to-end quantum model. We start with a simple three-layer classical neural network without any normalisation layers or activation functions, and gradually change the classical layers to the corresponding quantum versions. We conduct numerical experiments on image classification datasets such as the MNIST, FashionMNIST and CIFAR-10 datasets to demonstrate the change of performance brought by the systematic introduction of quantum components. Through this framework, our research sheds new light on the design of future quantum neural network models where it could be more favourable to search for methods and frameworks that harness the advantages from both the classical and quantum worlds.

Let the Quantum Creep In: Designing Quantum Neural Network Models by Gradually Swapping Out Classical Components

TL;DR

A framework where classical neural network layers are gradually replaced by quantum layers that have the same type of input and output while keeping the flow of information between layers unchanged is proposed, different from most current research in quantum neural network, which favours an end-to-end quantum model.

Abstract

Artificial Intelligence (AI), with its multiplier effect and wide applications in multiple areas, could potentially be an important application of quantum computing. Since modern AI systems are often built on neural networks, the design of quantum neural networks becomes a key challenge in integrating quantum computing into AI. To provide a more fine-grained characterisation of the impact of quantum components on the performance of neural networks, we propose a framework where classical neural network layers are gradually replaced by quantum layers that have the same type of input and output while keeping the flow of information between layers unchanged, different from most current research in quantum neural network, which favours an end-to-end quantum model. We start with a simple three-layer classical neural network without any normalisation layers or activation functions, and gradually change the classical layers to the corresponding quantum versions. We conduct numerical experiments on image classification datasets such as the MNIST, FashionMNIST and CIFAR-10 datasets to demonstrate the change of performance brought by the systematic introduction of quantum components. Through this framework, our research sheds new light on the design of future quantum neural network models where it could be more favourable to search for methods and frameworks that harness the advantages from both the classical and quantum worlds.
Paper Structure (16 sections, 27 equations, 31 figures, 12 tables)

This paper contains 16 sections, 27 equations, 31 figures, 12 tables.

Figures (31)

  • Figure 1: Overview of the framework proposed in this paper. The symbol for the quantum computer is inspired by cerezo2024does. (a) the information flow structure and the required dimensions of the input and output of each vacancy for candidate neural network layers. Double-lined boxes are the input and output of the neural network; Dash-lined boxes are layer vacancies for candidate neural network layers. Alongside the block of layers are the devices where the layer operation will mainly be executed on. The information passed between layers and the flatten operation are classical, while the candidate neural network layers could be either classical or (simulated) quantum. (b) The hybrid neural network, HybridNet, when replacement_level = 0. In this case all vacancies are filled with classical neural network layers (Conv2d and Linear). All these layers are executed on a GPU with classical neural network libraries. (c) HybridNet, when replacement_level = 1. In this case, the classical convolution layers Conv2d are replaced with its quantum counterpart, FlippedQuanv3x3, while the classical Linear layer left unchanged. The two quantum layers could be executed either via GPU simulation or on an actual quantum device. In this paper, they are simulated on a GPU since the current accessibility of quantum processors prohibits us from executing a very large number of circuits. (d) HybridNet, when replacement_level = 2. In this case, all the classical layers in (b) are replaced with their quantum counterpart, i.e. Conv2d$\rightarrow$FlippedQuanv3x3 and Linear$\rightarrow$DataReUploadingLinear. All quantum layers are simulated on a GPU when training and testing the neural network model.
  • Figure 2: A more detailed account of the components in the neural network architecture shown in Fig. \ref{['fig:fig1']}. Neural networks are essentially directed acyclic graphs, with layers as nodes and the flow of information as directed edges. The input layer (a) is just a placeholder for the input data. Grey-scale images from MNIST and FashionMNIST have only one channel, so the dimension is $1\times 32\times 32$ (after padding with zero); Colour images from CIFAR-10 have three channels, so the dimension is $3\times 32\times 32$. For the trainable layers (b), each has a required input dimension and an output dimension determined by the hyper-parameters of the layer. The dimension (shape) of the incoming data is the same as the output of the previous layer. The dimension of the outgoing data (feature map) is $C(\text{hannel})\times H(\text{eight})\times W\textrm{(idth)}$ for Conv2d and FlippedQuanv3x3 layers. For Linear and DataReUploadingLinear layers, it is a number of the vector dimension. Both Conv2d-like and Linear-like layers have hyper-parameters that could control the behaviour of the layer and change the dimension of the output. The output layer (c) is also a placeholder for the information that is going to the loss function. A neural network also contains non-trainable layers such as the flatten layer (d), which reshapes the multi-channel feature map from a convolution/quanvolution layer to a 1D vector. Putting all these together with layers as nodes and information flow as directed edges, we have the architecture for a neural network (e).
  • Figure 3: (a) Images and feature maps that only have one channel only need a single circuit for each patch $x$. (b) For images and feature maps that have multiple channels, the patch of image within the view of the FlippedQuanv3x3 kernel is a 3-D tensor with the the shape $(C, 3, 3)$. In this example, we take $C=3$, and in this case, three circuits with different parameters are required from the observables constructed from each channel to calculate the output of the FlippedQuanv3x3 kernel operation.
  • Figure 4: The DataReUploadingLinear layer at the end of the hybrid neural network. It takes a 12544-dimension feature vector from the Flatten layer, pad it with zeros and reshape it to a $2^7\times 2^7$ square matrix $M$. A quantum Hamiltonian $H_M$ is constructed with $M$, and this Hamiltonian will be used to construct the time-evolution operator $W = e^{-i H_M/L}$, where $L$ is the number of layers of the data re-uploading circuit.
  • Figure 5: Sample images from the MNIST dataset. The size of the original images is 28 by 28. Images are padded with zeros to 32 by 32.
  • ...and 26 more figures