Table of Contents
Fetching ...

A Post-Training Approach for Mitigating Overfitting in Quantum Convolutional Neural Networks

Aakash Ravindra Shinde, Charu Jain, Amir Kalev

TL;DR

This work finds that a straightforward adaptation of a classical post-training method, known as neuron dropout, to the quantum setting leads to a significant and undesirable consequence: a substantial decrease in success probability of the QCNN.

Abstract

Quantum convolutional neural network (QCNN), an early application for quantum computers in the NISQ era, has been consistently proven successful as a machine learning (ML) algorithm for several tasks with significant accuracy. Derived from its classical counterpart, QCNN is prone to overfitting. Overfitting is a typical shortcoming of ML models that are trained too closely to the availed training dataset and perform relatively poorly on unseen datasets for a similar problem. In this work we study post-training approaches for mitigating overfitting in QCNNs. We find that a straightforward adaptation of a classical post-training method, known as neuron dropout, to the quantum setting leads to a significant and undesirable consequence: a substantial decrease in success probability of the QCNN. We argue that this effect exposes the crucial role of entanglement in QCNNs and the vulnerability of QCNNs to entanglement loss. Hence, we propose a parameter adaptation method as an alternative method. Our method is computationally efficient and is found to successfully handle overfitting in the test cases.

A Post-Training Approach for Mitigating Overfitting in Quantum Convolutional Neural Networks

TL;DR

This work finds that a straightforward adaptation of a classical post-training method, known as neuron dropout, to the quantum setting leads to a significant and undesirable consequence: a substantial decrease in success probability of the QCNN.

Abstract

Quantum convolutional neural network (QCNN), an early application for quantum computers in the NISQ era, has been consistently proven successful as a machine learning (ML) algorithm for several tasks with significant accuracy. Derived from its classical counterpart, QCNN is prone to overfitting. Overfitting is a typical shortcoming of ML models that are trained too closely to the availed training dataset and perform relatively poorly on unseen datasets for a similar problem. In this work we study post-training approaches for mitigating overfitting in QCNNs. We find that a straightforward adaptation of a classical post-training method, known as neuron dropout, to the quantum setting leads to a significant and undesirable consequence: a substantial decrease in success probability of the QCNN. We argue that this effect exposes the crucial role of entanglement in QCNNs and the vulnerability of QCNNs to entanglement loss. Hence, we propose a parameter adaptation method as an alternative method. Our method is computationally efficient and is found to successfully handle overfitting in the test cases.
Paper Structure (11 sections, 1 equation, 6 figures, 3 tables)

This paper contains 11 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: General QCNN architecture. The QCNN includes three key components: A feature map, a parametric quantum circuit that includes concatenated convolution and pooling layers, and a measurement followed by an optimization unit. In this work the convolution and the pooling layers are constructed from two-qubit gates (building blocks). Examples of the building blocks we used are given in Fig. \ref{['fig:conv']}.
  • Figure 2: Two-qubit building blocks of the implemented QCNN. We used the architecture proposed and implemented in hur2022quantum. The building blocks for the convolution layers are given in subfigure (a) and (b) where $U_3(\theta, \varphi, \lambda) = R_z(\varphi)R_x(-\frac{\pi}{2})R_z(\theta)R_x(\frac{\pi}{2})R_z(\lambda)$, while the building block for the pooling layer is shown in subfigure (c).
  • Figure 3: Parameters evolution over time (iterations). The top plot shows how the values of three (out of 78) parameters in one experiment we run (which were chosen for illustration purpose) change with the number of iterations. The bottom row, shows a zoom on the last 100 iterations (color coded). In the PTA, we may choose to adjust the parameter value according to the range of its fluctuation in the last 100 iterations.
  • Figure 4: Example of Medical MNIST dataset images. Top row: Abdominal CT, bottom row: Chest CT. The similarity between Chest CT and Abdominal CT images implies that they would be hard to classify and in addition, that the learning model may be prone to overfitting.
  • Figure 5: Example of BraTS 2019 dataset brain images The high-resolution brain images at the top row were resized to 64 pixels, bottom row. The resulting images appeared unclear and pixelated, which pose a challenge for classification.
  • ...and 1 more figures