Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning
Pablo Bermejo, Borja Aizpurua, Roman Orus
TL;DR
This work tackles the pervasive problem of slow convergence in gradient-based optimization caused by barren plateaus and local minima in both classical and quantum ML. It introduces a self-consistent boosting strategy that enlarges the effective search space by treating the cost function $f(\vec{x})$ as an extra coordinate and performing updates in transformed coordinates, either via hyperspherical coordinates $(\theta_1,\dots,\theta_n,r)$ or frame rotations in the $(x_i,f(\vec{x}))$ plane. By alternately updating in the transformed space and projecting back to the original landscape, the method mitigates flat regions without altering the objective values, yielding faster convergence and improved stability across numerous quantum ML benchmarks. The approach is general and has potential to reduce computational costs and energy consumption in AI systems, with future work exploring other coordinate changes and applications to larger deep-learning and tensor-network models, as well as dynamic non-linear transformations. All mathematical notation is presented with proper delimitation, e.g., $f(\vec{x})$, $P=[x_1,\dots,x_n,f(\vec{x})]$, and $R_2\in SO(2)$.
Abstract
Machine learning algorithms, both in their classical and quantum versions, heavily rely on optimization algorithms based on gradients, such as gradient descent and alike. The overall performance is dependent on the appearance of local minima and barren plateaus, which slow-down calculations and lead to non-optimal solutions. In practice, this results in dramatic computational and energy costs for AI applications. In this paper we introduce a generic strategy to accelerate and improve the overall performance of such methods, allowing to alleviate the effect of barren plateaus and local minima. Our method is based on coordinate transformations, somehow similar to variational rotations, adding extra directions in parameter space that depend on the cost function itself, and which allow to explore the configuration landscape more efficiently. The validity of our method is benchmarked by boosting a number of quantum machine learning algorithms, getting a very significant improvement in their performance.
