Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm
Moustafa Zada
TL;DR
This work investigates whether NAS-driven hybrid classical-quantum architectures can enhance PPO performance in the CartPole task under NISQ constraints. It deploys Regularized Evolution NAS to search a large space of mixed classical-quantum architectures and reports PPO objectives using $L^{CPI}$ and the clipped $L^{CLIP}$ plus $VF$ and entropy terms via $L^{CLIP+VF+S}$. The key finding is that classical models dominated the results, with the best hybrid ranking 11th among unique models, and that quantum-layer configurations offering consistent gains remain elusive. The study provides practical design insights—such as favoring small-qubit quantum layers and careful entanglement choices—and underscores the need for broader environment testing and more robust hybrid NAS methods to ascertain when quantum components may offer real advantages in reinforcement learning.
Abstract
Recent studies in quantum machine learning advocated the use of hybrid models to assist with the limitations of the currently existing Noisy Intermediate Scale Quantum (NISQ) devices, but what was missing from most of them was the explanations and interpretations of the choices that were made to pick those exact architectures and the differentiation between good and bad hybrid architectures, this research attempts to tackle that gap in the literature by using the Regularized Evolution algorithm to search for the optimal hybrid classical-quantum architecture for the Proximal Policy Optimization (PPO) algorithm, a well-known reinforcement learning algorithm, ultimately the classical models dominated the leaderboard with the best hybrid model coming in eleventh place among all unique models, while we also try to explain the factors that contributed to such results,and for some models to behave better than others in hope to grasp a better intuition about what we should consider good practices for designing an efficient hybrid architecture.
