Table of Contents
Fetching ...

Reinforcement Learning for Dynamic Memory Allocation

Arisrei Lim, Abhiram Maddukuri

TL;DR

The paper frames dynamic memory allocation as a sequential MDP and investigates reinforcement learning as a means to learn adaptive memory management policies that can outperform fixed x-fit strategies under varying and adversarial request patterns. It compares high-level (first/best/worst-fit) versus low-level (exact address) actions and explores history-enabled states, employing DQN and PPO in simulated environments with page sizes up to 256. Results show that while low-level action policies can learn useful mappings, they often lag behind baselines due to training costs and rare invalid actions; higher-performing results emerge when using history-aware or mixed-adversarial scenarios, particularly with DQN+MLP on larger pages. The study highlights RL as a promising direction for adaptive memory allocation, capable of mitigating fragmentation and improving utilization, but also acknowledges limitations in realism, scalability, and the breadth of benchmarks, pointing to future work on multi-page systems and real-system validations.

Abstract

In recent years, reinforcement learning (RL) has gained popularity and has been applied to a wide range of tasks. One such popular domain where RL has been effective is resource management problems in systems. We look to extend work on RL for resource management problems by considering the novel domain of dynamic memory allocation management. We consider dynamic memory allocation to be a suitable domain for RL since current algorithms like first-fit, best-fit, and worst-fit can fail to adapt to changing conditions and can lead to fragmentation and suboptimal efficiency. In this paper, we present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics. We evaluate our approach through various experiments using high-level and low-level action spaces and examine different memory allocation patterns. Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies, particularly in environments characterized by adversarial request patterns. We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns. Overall, we find that RL offers a promising avenue for developing more adaptive and efficient memory allocation strategies, potentially overcoming limitations of hardcoded allocation algorithms.

Reinforcement Learning for Dynamic Memory Allocation

TL;DR

The paper frames dynamic memory allocation as a sequential MDP and investigates reinforcement learning as a means to learn adaptive memory management policies that can outperform fixed x-fit strategies under varying and adversarial request patterns. It compares high-level (first/best/worst-fit) versus low-level (exact address) actions and explores history-enabled states, employing DQN and PPO in simulated environments with page sizes up to 256. Results show that while low-level action policies can learn useful mappings, they often lag behind baselines due to training costs and rare invalid actions; higher-performing results emerge when using history-aware or mixed-adversarial scenarios, particularly with DQN+MLP on larger pages. The study highlights RL as a promising direction for adaptive memory allocation, capable of mitigating fragmentation and improving utilization, but also acknowledges limitations in realism, scalability, and the breadth of benchmarks, pointing to future work on multi-page systems and real-system validations.

Abstract

In recent years, reinforcement learning (RL) has gained popularity and has been applied to a wide range of tasks. One such popular domain where RL has been effective is resource management problems in systems. We look to extend work on RL for resource management problems by considering the novel domain of dynamic memory allocation management. We consider dynamic memory allocation to be a suitable domain for RL since current algorithms like first-fit, best-fit, and worst-fit can fail to adapt to changing conditions and can lead to fragmentation and suboptimal efficiency. In this paper, we present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics. We evaluate our approach through various experiments using high-level and low-level action spaces and examine different memory allocation patterns. Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies, particularly in environments characterized by adversarial request patterns. We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns. Overall, we find that RL offers a promising avenue for developing more adaptive and efficient memory allocation strategies, potentially overcoming limitations of hardcoded allocation algorithms.

Paper Structure

This paper contains 13 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Average Return and 95% Confidence Interval over 10000 Rollouts for Low-Level Action Network
  • Figure 2: Average Return and 95% Confidence Interval over 5 training sessions, each with 100 Rollouts on Adversarial Allocation Requests
  • Figure 3: Average Return and 95% Confidence Interval over 5 training sessions, each with 100 Rollouts on Mixed Allocation Requests