Table of Contents
Fetching ...

Associative Memory Based Experience Replay for Deep Reinforcement Learning

Mengyuan Li, Arman Kazemi, Ann Franchesca Laguna, X. Sharon Hu

TL;DR

This work targets the latency bottleneck of prioritized experience replay (PER) in deep Q-network (DQN) agents by introducing AMPER, a hardware-software co-design that uses associative memory (AM) to replace the tree-traversal priority sampling. AMPER comprises two AM-based variants, AMPER-k and AMPER-fr, which construct a candidate priority set (CSP) and sample from it using AM-enabled nearest-neighbor strategies, enabling efficient in-memory searches. A dedicated AM-based accelerator (TCAM arrays, URNG, and a CSP buffer) supports CSP construction and sampling, achieving up to 270× latency reduction over GPU-based PER with comparable learning performance on standard OpenAI Gym tasks. The approach addresses the memory-wall in online DRL by performing in-memory computations that preserve PER-like sampling behavior while significantly reducing data movement and irregular memory accesses, enabling real-time learning with large replay memories.

Abstract

Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.

Associative Memory Based Experience Replay for Deep Reinforcement Learning

TL;DR

This work targets the latency bottleneck of prioritized experience replay (PER) in deep Q-network (DQN) agents by introducing AMPER, a hardware-software co-design that uses associative memory (AM) to replace the tree-traversal priority sampling. AMPER comprises two AM-based variants, AMPER-k and AMPER-fr, which construct a candidate priority set (CSP) and sample from it using AM-enabled nearest-neighbor strategies, enabling efficient in-memory searches. A dedicated AM-based accelerator (TCAM arrays, URNG, and a CSP buffer) supports CSP construction and sampling, achieving up to 270× latency reduction over GPU-based PER with comparable learning performance on standard OpenAI Gym tasks. The approach addresses the memory-wall in online DRL by performing in-memory computations that preserve PER-like sampling behavior while significantly reducing data movement and irregular memory accesses, enabling real-time learning with large replay memories.

Abstract

Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.
Paper Structure (22 sections, 4 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of a DQN agent interacting with the environment. The agent has three main components: (1) action network, (2) target network, and (3) ER memory.
  • Figure 2: Illustration of PER implementation. (a) An example with 4 prioritized experiences. (b) The basic idea of sum-based sampling. (c) The sum-tree based implementation of (b). Leaf nodes contain the priority values. The search process of $Y=4$ is highlighted in red. (d) A high-level conceptual view of AMPER for the example in (a).
  • Figure 3: (a) Generic AM array structure (4×8 array) based on the NOR connection. Different match schemes: (b) exact match: the rows that are same as the input query; (c) best match: the row which has the shortest distance from input query is the best match. hu2021memory
  • Figure 4: Latency breakdown for executing the UER-DQN and PER-DQN algorithm for the CartPole and Atari Pong environment. Size is the ER memory size and step is the total number of time steps.
  • Figure 5: Key AMPER concepts: (a) Distribution of all priorities. X-axis is the priority value. Y-axis is the count corresponding to each distinct priority value. (b) Example of kNN based AMPER. 5 ($m=5)$ groups are used (separated by thick black lines), and the priorities in the red-dashed blocks are selected. (c) Example of frNN based AMPER. One group is shown as other groups follow the same idea.
  • ...and 4 more figures