Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

Moustafa Zada

Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

Moustafa Zada

TL;DR

This work investigates whether NAS-driven hybrid classical-quantum architectures can enhance PPO performance in the CartPole task under NISQ constraints. It deploys Regularized Evolution NAS to search a large space of mixed classical-quantum architectures and reports PPO objectives using $L^{CPI}$ and the clipped $L^{CLIP}$ plus $VF$ and entropy terms via $L^{CLIP+VF+S}$. The key finding is that classical models dominated the results, with the best hybrid ranking 11th among unique models, and that quantum-layer configurations offering consistent gains remain elusive. The study provides practical design insights—such as favoring small-qubit quantum layers and careful entanglement choices—and underscores the need for broader environment testing and more robust hybrid NAS methods to ascertain when quantum components may offer real advantages in reinforcement learning.

Abstract

Recent studies in quantum machine learning advocated the use of hybrid models to assist with the limitations of the currently existing Noisy Intermediate Scale Quantum (NISQ) devices, but what was missing from most of them was the explanations and interpretations of the choices that were made to pick those exact architectures and the differentiation between good and bad hybrid architectures, this research attempts to tackle that gap in the literature by using the Regularized Evolution algorithm to search for the optimal hybrid classical-quantum architecture for the Proximal Policy Optimization (PPO) algorithm, a well-known reinforcement learning algorithm, ultimately the classical models dominated the leaderboard with the best hybrid model coming in eleventh place among all unique models, while we also try to explain the factors that contributed to such results,and for some models to behave better than others in hope to grasp a better intuition about what we should consider good practices for designing an efficient hybrid architecture.

Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

TL;DR

and the clipped

plus

and entropy terms via

. The key finding is that classical models dominated the results, with the best hybrid ranking 11th among unique models, and that quantum-layer configurations offering consistent gains remain elusive. The study provides practical design insights—such as favoring small-qubit quantum layers and careful entanglement choices—and underscores the need for broader environment testing and more robust hybrid NAS methods to ascertain when quantum components may offer real advantages in reinforcement learning.

Abstract

Paper Structure (11 sections, 4 equations, 1 figure, 1 table)

This paper contains 11 sections, 4 equations, 1 figure, 1 table.

Introduction
Related Work
Background
Proximal Policy Optimization
Regularized Evolution
Experiments
The Mutations Used
Experimental Constraints
Results
Conclusions
Future Work

Figures (1)

Figure 1: The Average Reward on The Span of approx. 950 Iterations

Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

TL;DR

Abstract

Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (1)