Table of Contents
Fetching ...

Distilling the knowledge with quantum neural networks

Yuxuan Yan, Sitian Qian, Qi Zhao, Xingjian Zhang

Abstract

Quantum Neural Networks (QNNs) are a promising class of quantum machine learning models with potential quantum advantages when implemented on scalable, error-corrected quantum computers. However, as system sizes increase, deploying QNNs becomes challenging. Similar to their classical counterparts, a key obstacle to their practical applications is that large-scale QNNs may not be easily deployed on smaller systems that have limited resources. Here, we tackle this challenge by compressing QNNs via knowledge distillation. We demonstrate how well-trained QNNs on large systems can be distilled into smaller architectures with similar configurations. We numerically show that knowledge distillation helps reduce the training cost of QNNs in terms of the number of qubits and circuit depth. Additionally, we find that a self-knowledge-distillation approach can accelerate training convergence. We believe our results offer new strategies for the efficient compression and practical deployment of QNNs.

Distilling the knowledge with quantum neural networks

Abstract

Quantum Neural Networks (QNNs) are a promising class of quantum machine learning models with potential quantum advantages when implemented on scalable, error-corrected quantum computers. However, as system sizes increase, deploying QNNs becomes challenging. Similar to their classical counterparts, a key obstacle to their practical applications is that large-scale QNNs may not be easily deployed on smaller systems that have limited resources. Here, we tackle this challenge by compressing QNNs via knowledge distillation. We demonstrate how well-trained QNNs on large systems can be distilled into smaller architectures with similar configurations. We numerically show that knowledge distillation helps reduce the training cost of QNNs in terms of the number of qubits and circuit depth. Additionally, we find that a self-knowledge-distillation approach can accelerate training convergence. We believe our results offer new strategies for the efficient compression and practical deployment of QNNs.
Paper Structure (10 sections, 9 equations, 6 figures, 1 table)

This paper contains 10 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of the proposed QNN-KD framework. The dataset is processed by both a teacher model and a student model. The dataset comprises input data and label. The input can be either classical data represented as vectors, $x$, or quantum data represented as states, $\ket{\psi}$. The labels, $y$, may represent different physical meanings, such as the number that a handwritten digit image represents and various quantum phases. Before feeding the input data into the QNN models, data processing may be required. The classical input is first processed with Principal Component Analysis (PCA) and then encoded into quantum states. In our work, we apply a quantum phase encoder. The quantum input may be processed with a Quantum AutoEncoder (QAE) to match the size of the QNN. The teacher model generates supervisory signals. The student QNN model learns to mimic the teacher's outputs by minimizing the knowledge distillation loss $\mathcal{L}_{\mathrm{KD}}$ over a set of tunable parameters $\{\theta_i^s\}_i$ through gradient-based optimization. In this work, we consider two cases: the teacher model is a larger QNN than the student model with a similar architecture, and the teacher model is the same as the student model for self-KD. QNN-KD can efficiently reduce the size of the QNN and accelerate training in a self-KD scenario.
  • Figure 2: Hardware-efficient QNN architecture illustrated with four qubits and a single layer. The single-qubit rotations consist of sequential rotations around the $X$-, $Y$-, and $Z$-axes, as denoted by blue, orange, and green boxes, respectively. The entangling operations are implemented using controlled-NOT (CNOT) gates. After applying layers of single-qubit rotation gates around each of the three axes, a layer of CNOT gates that entangle multiple qubits are applied. The circuit architecture proceeds with alternate layers between single-qubit gates and CNOT gates. The circuit depth is defined as the number of CNOT gate layers.
  • Figure 3: Architecture of the Quantum AutoEncoder (QAE), illustrated with a compression from two qubits to one qubit. In our simulation, we employ a similar architecture for compression. The initial state lies in the joint system given by $A$ and $B$. The state is first encoded into a single-qubit state on system $A$, where we depict the encoder circuit by the dashed box. Afterward, the system is decoded with an auxiliary system, $B'$, where we depict the decoder circuit by the dotted box. The single-qubit rotation gates are defined as in Figure \ref{['fig:qnn_block']}, with the values representing those in our numerical experiment. The yellow boxes represent two-qubit entangling gates. The gate $e^{-i(ZZ)}$ represents $\exp[-i(Z\otimes Z)]$ with $Z$ being the Pauli-$Z$ operator, and the other two-qubit gates are defined similarly.
  • Figure 4: The effect of KD on student model accuracy in topological phase classification as a function of network layers for a 15-qubit system. The plot compares the test (solid lines) and train (dashed lines) accuracies for student models trained without KD (blue) and with KD (green). The constant dotted lines represent the baseline test (purple) and train (brown) accuracies of the larger teacher model.
  • Figure 5: The effect of KD on student model accuracy in MNIST binary classification as a function of network layers for a 15-qubit system. The plot compares the test (solid lines) and train (dashed lines) accuracies for student models trained without KD (blue) and with KD (green). The constant dotted lines represent the baseline test (purple) and train (brown) accuracies of the larger teacher model.
  • ...and 1 more figures