Table of Contents
Fetching ...

Information plane and compression-gnostic feedback in quantum machine learning

Nathan Haboury, Mo Kordzanganeh, Alexey Melnikov, Pavel Sekatski

TL;DR

This paper analyzes the impact of the proposed modifications on the performances of neural networks in a classification task and considers two ways to do so: via a multiplicative regularization of the loss function, or with a compression-gnostic scheduler of the learning rate.

Abstract

The information plane (Tishby et al. arXiv:physics/0004057, Shwartz-Ziv et al. arXiv:1703.00810) has been proposed as an analytical tool for studying the learning dynamics of neural networks. It provides quantitative insight on how the model approaches the learned state by approximating a minimal sufficient statistics. In this paper we extend this tool to the domain of quantum learning models. In a second step, we study how the insight on how much the model compresses the input data (provided by the information plane) can be used to improve a learning algorithm. Specifically, we consider two ways to do so: via a multiplicative regularization of the loss function, or with a compression-gnostic scheduler of the learning rate (for algorithms based on gradient descent). Both ways turn out to be equivalent in our implementation. Finally, we benchmark the proposed learning algorithms on several classification and regression tasks using variational quantum circuits. The results demonstrate an improvement in test accuracy and convergence speed for both synthetic and real-world datasets. Additionally, with one example we analyzed the impact of the proposed modifications on the performances of neural networks in a classification task.

Information plane and compression-gnostic feedback in quantum machine learning

TL;DR

This paper analyzes the impact of the proposed modifications on the performances of neural networks in a classification task and considers two ways to do so: via a multiplicative regularization of the loss function, or with a compression-gnostic scheduler of the learning rate.

Abstract

The information plane (Tishby et al. arXiv:physics/0004057, Shwartz-Ziv et al. arXiv:1703.00810) has been proposed as an analytical tool for studying the learning dynamics of neural networks. It provides quantitative insight on how the model approaches the learned state by approximating a minimal sufficient statistics. In this paper we extend this tool to the domain of quantum learning models. In a second step, we study how the insight on how much the model compresses the input data (provided by the information plane) can be used to improve a learning algorithm. Specifically, we consider two ways to do so: via a multiplicative regularization of the loss function, or with a compression-gnostic scheduler of the learning rate (for algorithms based on gradient descent). Both ways turn out to be equivalent in our implementation. Finally, we benchmark the proposed learning algorithms on several classification and regression tasks using variational quantum circuits. The results demonstrate an improvement in test accuracy and convergence speed for both synthetic and real-world datasets. Additionally, with one example we analyzed the impact of the proposed modifications on the performances of neural networks in a classification task.

Paper Structure

This paper contains 29 sections, 19 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: (Main) The information on the label $I(T_{1}^Z:Y)$ (dashed line) and on the data $I(T_{1}^Z:X)$ (full line) encoded in the distribution of the $\sigma_Z$ observable on the first qubit after the last layer, see Eq. \ref{['eq: T1z']}. (Inset) From bottom to top: $I(T_\text{all}:X), I(T_{1}:X)$ and $I(T_1^Z:X)$ -- the information on the data encoded by all the qubits, the first qubits, and the $\sigma_Z$ component of the first qubit, see Eqs. (\ref{['eq: Tall']}-\ref{['eq: T1z']}). The bottom line is the information of the label $I(T_{1}^Z:Y)$. The two bottom lines are the same as in the main figure.
  • Figure 2: Architecture of the circuits for the quantum learning models for the case of $N=4$ qubits. a) The overall circuit is composed of several data reuploading layers in series followed by the variational layers. In the end, the first qubit is measured on a computational basis, and the expectation value of the corresponding Pauli-Z observable is the classical output by the circuit. b) The internal structure of the variational layers, composed of parameter rotations $X_\theta$ around the Pauli-X, the rotations $Z_x$ around the Pauli-Z encoding the data features, and the two-qubit CNOT gates. The number of encoding gates $Z_x$ can vary depending on the dataset and the number of qubits in the circuit, that is, in a given data re-uploading layer the gates $Z_x$ are not necessarily present to act on each qubit. c) The inner structure of the variational layers, which do not directly depend on the input data.
  • Figure 3: In a) and b) plot of the test accuracy obtained on the test set compared to the $\alpha$ values for static and dynamic $\alpha$. The blue curve and background show mean values and start deviation. The red dashed line shows the models with $\alpha = 0$. Maximum and minimum show the best and worst models. In c) and d) plot of the number of steps to converge compared to the $\alpha$ values for static and dynamic $\alpha$. The blue curve and background show mean values and start deviation. The red dash line shows the models with $\alpha = 0$. Maximum and minimum show the best models within various initial weights.
  • Figure 4: In a) and b) plot of the training and test accuracy ratio for static and dynamic $\alpha$ values. The blue curve and background show mean values and start deviation. The red dash line shows the models with $\alpha = 0$. Maximum and minimum show the best models within various initial weights.
  • Figure 5: Visualization of the dataset from different angles. In $a)$, the four could, inclined by an angle, can we well distinguish. In $b)$, the normal distribution of each cloud in the dataset can be pictured. The points of color red belong to Class 0, while the points of color blue correspond to Class 1. The clouds are centered around the point (0, 0) in the $x-y$ plane. The $z$ coordinates values for the clouds are set according to the equation $z = a + x\cos(60)$, where $x$ represents the $x$ coordinates values of the points and $a$ take value in the interval $[0,2,4,6]$ for each of the four clouds.
  • ...and 4 more figures