Table of Contents
Fetching ...

Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search

Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yingyan Celine Lin

TL;DR

The paper tackles the challenge of deploying DRL agents under real-time, resource-constrained conditions by showing that optimal performance requires task-specific model sizes. It introduces Auto-Agent-Distiller (A2D), the first NAS framework tailored for DRL, augmented with an AC-based distillation mechanism to stabilize the search and improve agent quality. A2D employs a one-level differentiable NAS over a 14-cell, 9-operator search space, guided by distillation and a cost term to derive efficient DRL agents that match or exceed manual designs with substantially reduced FLOPs. Experiments on Atari demonstrate that A2D delivers competitive performance with significantly improved efficiency and can automatically adapt network size to task difficulty, highlighting the potential for automated, on-device DRL deployment.

Abstract

AlphaGo's astonishing performance has ignited an explosive interest in developing deep reinforcement learning (DRL) for numerous real-world applications, such as intelligent robotics. However, the often prohibitive complexity of DRL stands at the odds with the required real-time control and constrained resources in many DRL applications, limiting the great potential of DRL powered intelligent devices. While substantial efforts have been devoted to compressing other deep learning models, existing works barely touch the surface of compressing DRL. In this work, we first identify that there exists an optimal model size of DRL that can maximize both the test scores and efficiency, motivating the need for task-specific DRL agents. We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency. Specifically, we demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability, and then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality. Extensive experiments and ablation studies consistently validate our findings and the advantages and general applicability of our A2D, outperforming manually designed DRL in both the test scores and efficiency. All the codes will be released upon acceptance.

Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search

TL;DR

The paper tackles the challenge of deploying DRL agents under real-time, resource-constrained conditions by showing that optimal performance requires task-specific model sizes. It introduces Auto-Agent-Distiller (A2D), the first NAS framework tailored for DRL, augmented with an AC-based distillation mechanism to stabilize the search and improve agent quality. A2D employs a one-level differentiable NAS over a 14-cell, 9-operator search space, guided by distillation and a cost term to derive efficient DRL agents that match or exceed manual designs with substantially reduced FLOPs. Experiments on Atari demonstrate that A2D delivers competitive performance with significantly improved efficiency and can automatically adapt network size to task difficulty, highlighting the potential for automated, on-device DRL deployment.

Abstract

AlphaGo's astonishing performance has ignited an explosive interest in developing deep reinforcement learning (DRL) for numerous real-world applications, such as intelligent robotics. However, the often prohibitive complexity of DRL stands at the odds with the required real-time control and constrained resources in many DRL applications, limiting the great potential of DRL powered intelligent devices. While substantial efforts have been devoted to compressing other deep learning models, existing works barely touch the surface of compressing DRL. In this work, we first identify that there exists an optimal model size of DRL that can maximize both the test scores and efficiency, motivating the need for task-specific DRL agents. We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency. Specifically, we demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability, and then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality. Extensive experiments and ablation studies consistently validate our findings and the advantages and general applicability of our A2D, outperforming manually designed DRL in both the test scores and efficiency. All the codes will be released upon acceptance.

Paper Structure

This paper contains 16 sections, 10 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Test scores averaged over 30 episodes during the training of five models on four Atari games bellemare2013arcade.
  • Figure 2: Test score evolution during the search processes of three different search schemes on four Atari games bellemare2013arcade, where Vanilla-DNAS denotes directly applying NAS without distillation, and A2D-One-level and A2D-Bi-level search under the guidance of the distillation loss using one-level and bi-level optimization, respectively.
  • Figure 3: Test score and efficiency trade-offs achieved by our A2D and the manually designed DRL on 12 Atari games bellemare2013arcade, where the three manual designs use the vanilla network, ResNet-14, and ResNet-20 introduced in Sec. \ref{['sec:scalability']}.