Table of Contents
Fetching ...

Characterizing Physical Memory Fragmentation

Mark Mansi, Michael M. Swift

TL;DR

The paper investigates external physical memory fragmentation in production systems, highlighting its impact on performance and large-contiguity optimizations like huge pages. It presents a principled production study of 248 UW–Madison machines, uncovering six memory-usage patterns and identifying file-cache activity and memory reclamation as major fragmentation drivers, while showing that real-world contiguity is often limited to smaller scales. To advance experimental methodology, the authors introduce andúril, an MP-based tool intended to reproduce fragmentation patterns without replaying exact event histories, and they provide a rigorous validation framework based on comparing empirical class distributions to an MP’s stationary distribution. Despite providing valuable insights, the study reveals that andúril often fails to reproduce end-to-end performance on real workloads, exposing limitations in current kernel memory-management extensibility and underscoring the need for contiguity-aware reclamation and more robust artificial-fragmentation methods for future research.

Abstract

External fragmentation of physical memory occurs when adjacent differently sized regions of allocated physical memory are freed at different times, causing free memory to be physically discontiguous. It can significantly degrade system performance and efficiency, such as reducing the ability to use huge pages, a critical optimization on modern large-memory system. For decades system developers have sought to avoid and mitigate fragmentation, but few prior studies quantify and characterize it in production settings. Moreover, prior work often artificially fragments physical memory to create more realistic performance evaluations, but their fragmentation methodologies are ad hoc and unvalidated. Out of 13 papers, we found 11 different methodologies, some of which were subsequently found inadequate. The importance of addressing fragmentation necessitates a validated and principled methodology. Our work fills these gaps in knowledge and methodology. We conduct a study of memory fragmentation in production by observing 248 machines in the Computer Sciences Department at University of Wisconsin - Madison for a week. We identify six key memory usage patterns, and find that Linux's file cache and page reclamation systems are major contributors to fragmentation because they often obliviously break up contiguous memory. Finally, we create andúril, a tool to artificially fragment memory during experimental research evaluations. While andúril ultimately fails as a scientific tool, we discuss its design ideas, merits, and failings in hope that they may inspire future research.

Characterizing Physical Memory Fragmentation

TL;DR

The paper investigates external physical memory fragmentation in production systems, highlighting its impact on performance and large-contiguity optimizations like huge pages. It presents a principled production study of 248 UW–Madison machines, uncovering six memory-usage patterns and identifying file-cache activity and memory reclamation as major fragmentation drivers, while showing that real-world contiguity is often limited to smaller scales. To advance experimental methodology, the authors introduce andúril, an MP-based tool intended to reproduce fragmentation patterns without replaying exact event histories, and they provide a rigorous validation framework based on comparing empirical class distributions to an MP’s stationary distribution. Despite providing valuable insights, the study reveals that andúril often fails to reproduce end-to-end performance on real workloads, exposing limitations in current kernel memory-management extensibility and underscoring the need for contiguity-aware reclamation and more robust artificial-fragmentation methods for future research.

Abstract

External fragmentation of physical memory occurs when adjacent differently sized regions of allocated physical memory are freed at different times, causing free memory to be physically discontiguous. It can significantly degrade system performance and efficiency, such as reducing the ability to use huge pages, a critical optimization on modern large-memory system. For decades system developers have sought to avoid and mitigate fragmentation, but few prior studies quantify and characterize it in production settings. Moreover, prior work often artificially fragments physical memory to create more realistic performance evaluations, but their fragmentation methodologies are ad hoc and unvalidated. Out of 13 papers, we found 11 different methodologies, some of which were subsequently found inadequate. The importance of addressing fragmentation necessitates a validated and principled methodology. Our work fills these gaps in knowledge and methodology. We conduct a study of memory fragmentation in production by observing 248 machines in the Computer Sciences Department at University of Wisconsin - Madison for a week. We identify six key memory usage patterns, and find that Linux's file cache and page reclamation systems are major contributors to fragmentation because they often obliviously break up contiguous memory. Finally, we create andúril, a tool to artificially fragment memory during experimental research evaluations. While andúril ultimately fails as a scientific tool, we discuss its design ideas, merits, and failings in hope that they may inspire future research.
Paper Structure (36 sections, 9 figures, 2 tables)

This paper contains 36 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Examples of memory usage patterns.
  • Figure 2: Amount of contiguity in free and allocated memory for the different memory usage patterns in Figure \ref{['fig:exmemusage']}.
  • Figure 3: Median % of free memory vs median % of free memory that is contiguous enough for a 2MB huge page throughout the observation window for each machine. The 205 blue points indicate either low-memory-low-huge-page nodes or low-usage-high-huge-page nodes. The 43 red points indicate all other nodes.
  • Figure 4: Example: converting physical memory usage patterns to a MP. Different sized-boxes represent different sized memory regions.
  • Figure 5: CDF of andúril accuracy scores for all machines.
  • ...and 4 more figures