Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

Yonggang Jin; Chenxu Wang; Tianyu Zheng; Liuyu Xiang; Yaodong Yang; Junge Zhang; Jie Fu; Zhaofeng He

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

Yonggang Jin, Chenxu Wang, Tianyu Zheng, Liuyu Xiang, Yaodong Yang, Junge Zhang, Jie Fu, Zhaofeng He

TL;DR

The paper tackles sample inefficiency in deep reinforcement learning by introducing a hippocampus-inspired memory retrieval module that retrieves task-relevant past experiences. It couples a task-conditioned hypernetwork to adapt the retrieval network and a dynamic modification mechanism to coordinate retrieval with the policy network, within PPO. The approach, tested on multitask MiniGrid environments, achieves superior sample efficiency and higher rewards than strong baselines, especially as task count grows. This work demonstrates that task-aware memory retrieval and dynamic collaboration can substantially improve multitask reinforcement learning.

Abstract

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities. In contrast, humans rely on their hippocampus to retrieve relevant information from past experiences of relevant tasks, which guides their decision-making when learning a new task, rather than exclusively depending on environmental interactions. Nevertheless, designing a hippocampus-like module for an agent to incorporate past experiences into established reinforcement learning algorithms presents two challenges. The first challenge involves selecting the most relevant past experiences for the current task, and the second challenge is integrating such experiences into the decision network. To address these challenges, we propose a novel method that utilizes a retrieval network based on task-conditioned hypernetwork, which adapts the retrieval network's parameters depending on the task. At the same time, a dynamic modification mechanism enhances the collaborative efforts between the retrieval and decision networks. We evaluate the proposed method across various tasks within a multitask scenario in the Minigrid environment. The experimental results demonstrate that our proposed method significantly outperforms strong baselines.

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

TL;DR

Abstract

Paper Structure (23 sections, 19 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 19 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Episodic Memory
Task-Conditioned Hypernetwork
Method
Retrieval Based on Hypernetwork
Episodic Memory
Task-Conditioned Hypernetwork
Retrieval Network
Dynamic Modification Mechanism
Experiment
Experimental Setting
Evaluation Results
Ablation and Analysis
Task-Conditioned Hypernetwork
...and 8 more sections

Figures (6)

Figure 1: On the left, we present an illustrative example of Task-Adaptive Retrieval. Here, the hippocampus of the agent must adapt its retrieval strategies to accommodate the differing demands of Task 1 and Task 2. It retrieves different information for different tasks. In contrast, Non-Adaptive Retrieval, depicted on the right, relies solely on the observed state, leading to consistent outcomes across both tasks due to the lack of task-specific adaptability.
Figure 2: The model architecture. To start, a task-conditioned hypernetwork dynamically adapts the parameters of both the retrieval and decision networks according to the ongoing task. Subsequently, the retrieval network accesses the episodic memory to compute $\bm{V_{mem}}$. Lastly, the loss of the critic network undergoes modification through the incorporation of $\bm{V_{mem}}$, facilitated by a dynamic modification mechanism. As a result, gradients are computed, propagated, and utilized to update the network parameters.
Figure 3: EMPPOHypernet(w/o language), and EMPPOHypernet's learning curve.
Figure 4: PPO, SEMPPO, and DEMPPO's learning curve.
Figure 5: PPO, PPOHypernet, EMPPO-para, EMPPO-moe and EMPPOHypernet(ours)'s learning curve.
...and 1 more figures

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

TL;DR

Abstract

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

Authors

TL;DR

Abstract

Table of Contents

Figures (6)