Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning

Pablo Bermejo; Borja Aizpurua; Roman Orus

Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning

Pablo Bermejo, Borja Aizpurua, Roman Orus

TL;DR

This work tackles the pervasive problem of slow convergence in gradient-based optimization caused by barren plateaus and local minima in both classical and quantum ML. It introduces a self-consistent boosting strategy that enlarges the effective search space by treating the cost function $f(\vec{x})$ as an extra coordinate and performing updates in transformed coordinates, either via hyperspherical coordinates $(\theta_1,\dots,\theta_n,r)$ or frame rotations in the $(x_i,f(\vec{x}))$ plane. By alternately updating in the transformed space and projecting back to the original landscape, the method mitigates flat regions without altering the objective values, yielding faster convergence and improved stability across numerous quantum ML benchmarks. The approach is general and has potential to reduce computational costs and energy consumption in AI systems, with future work exploring other coordinate changes and applications to larger deep-learning and tensor-network models, as well as dynamic non-linear transformations. All mathematical notation is presented with proper delimitation, e.g., $f(\vec{x})$, $P=[x_1,\dots,x_n,f(\vec{x})]$, and $R_2\in SO(2)$.

Abstract

Machine learning algorithms, both in their classical and quantum versions, heavily rely on optimization algorithms based on gradients, such as gradient descent and alike. The overall performance is dependent on the appearance of local minima and barren plateaus, which slow-down calculations and lead to non-optimal solutions. In practice, this results in dramatic computational and energy costs for AI applications. In this paper we introduce a generic strategy to accelerate and improve the overall performance of such methods, allowing to alleviate the effect of barren plateaus and local minima. Our method is based on coordinate transformations, somehow similar to variational rotations, adding extra directions in parameter space that depend on the cost function itself, and which allow to explore the configuration landscape more efficiently. The validity of our method is benchmarked by boosting a number of quantum machine learning algorithms, getting a very significant improvement in their performance.

Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning

TL;DR

as an extra coordinate and performing updates in transformed coordinates, either via hyperspherical coordinates

or frame rotations in the

plane. By alternately updating in the transformed space and projecting back to the original landscape, the method mitigates flat regions without altering the objective values, yielding faster convergence and improved stability across numerous quantum ML benchmarks. The approach is general and has potential to reduce computational costs and energy consumption in AI systems, with future work exploring other coordinate changes and applications to larger deep-learning and tensor-network models, as well as dynamic non-linear transformations. All mathematical notation is presented with proper delimitation, e.g.,

, and

Abstract

Paper Structure (6 sections, 2 equations, 10 figures, 3 tables)

This paper contains 6 sections, 2 equations, 10 figures, 3 tables.

Introduction
Methodology
Option 1: hyperspherical coordinates
Option 2: plane rotations
Results
Conclusions and further work

Figures (10)

Figure 1: (a) Cost function $f(x)$ presents a plateau as a function of optimization variable $x$, so that gradient methods get stalled at point $P$ due to a null gradient. (b) A change in the polar coordinates of point $P$ leads to a point $P'$, which can then be "collapsed" back to the landscape of the cost function, leading to point $P_c$. Optimization from point $P_c$ is no longer stalled, since the gradient is non-zero. (c) A frame rotation leads to a description of point $P$ with different cartesian coordinates $f_r(x_r)$ and $x_r$. In the new rotated frame, the gradient at $P$ is non-zero, so that gradient methods are no longer stalled.
Figure 2: [Color online] Improvement in the average number of iterations as a function of the learning rate, for the algorithm in Ref. Cost_function_dependent to alleviate barren plateaus with local cost functions. Comparison between PennyLane and the Change of Coordinates (CC) implementations.
Figure 3: [Color online] Convergence of cost function versus number of optimization steps, in function fitting using a quantum signal processing polynomial, for the algorithm in Ref. function_fitting_QSP. Comparison between PennyLane and the Change of Coordinates (CC) implementations.
Figure 4: [Color online] Convergence of cost function and accuracy versus number of optimization steps, in the variational quantum classifier from Ref. classification_qnn for the Iris dataset. Comparison between PennyLane and the Change of Coordinates (CC) implementations.
Figure 5: [Color online] Convergence of cost function versus number of optimization steps, in the variational quantum thermalizer from Ref. variational_quantum_thermalizer. Comparison between PennyLane and the Change of Coordinates (CC) implementations.
...and 5 more figures

Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning

TL;DR

Abstract

Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)