Table of Contents
Fetching ...

Offloading Data Center Tax

Akshay Revankar, Charan Renganathan, Sartaj Wariah

TL;DR

The paper tackles the problem of optimizing data center performance by offloading multiple interrelated tax components rather than targeting a single bottleneck. It profiles MongoDB within the DeathStarBench Social Network workload using Perf and flame graphs to map cycles to tax components and analyzes cache/memory behavior, including Cache Allocation Technology (CAT) and Memory Bandwidth Monitoring (MBM). The study reveals that Network, Compression, and Memory Allocation dominate cycle counts, with strong correlations between Memory Allocation and Compression, and suggests that co-offloading these components (potentially via libOS, DPDK, or similar approaches) and resizing cache resources could yield meaningful gains without degrading tail latency. These insights inform hardware-software co-design and resource-allocation strategies for warehouse-scale data centers, guiding where accelerators or software optimizations should be focused for maximum ripple effects across the fleet.

Abstract

The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements across the data center fleet. Typically, performance enhancements in tax components are achieved by offloading them to accelerators, however, it is not practical to offload every tax component. The goal of this paper is to identify opportunities to offload more than one tax component together. We focus on MongoDB which is a common microservice used in a large number of applications in the datacenter. We profile MongoDB running as part of the DeathStarBench benchmark suite, identifying its tax components and their microarchitectural implications. We make observations and suggestions based on the inferences made to offload a few of the tax components in this application.

Offloading Data Center Tax

TL;DR

The paper tackles the problem of optimizing data center performance by offloading multiple interrelated tax components rather than targeting a single bottleneck. It profiles MongoDB within the DeathStarBench Social Network workload using Perf and flame graphs to map cycles to tax components and analyzes cache/memory behavior, including Cache Allocation Technology (CAT) and Memory Bandwidth Monitoring (MBM). The study reveals that Network, Compression, and Memory Allocation dominate cycle counts, with strong correlations between Memory Allocation and Compression, and suggests that co-offloading these components (potentially via libOS, DPDK, or similar approaches) and resizing cache resources could yield meaningful gains without degrading tail latency. These insights inform hardware-software co-design and resource-allocation strategies for warehouse-scale data centers, guiding where accelerators or software optimizations should be focused for maximum ripple effects across the fleet.

Abstract

The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements across the data center fleet. Typically, performance enhancements in tax components are achieved by offloading them to accelerators, however, it is not practical to offload every tax component. The goal of this paper is to identify opportunities to offload more than one tax component together. We focus on MongoDB which is a common microservice used in a large number of applications in the datacenter. We profile MongoDB running as part of the DeathStarBench benchmark suite, identifying its tax components and their microarchitectural implications. We make observations and suggestions based on the inferences made to offload a few of the tax components in this application.

Paper Structure

This paper contains 23 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: A small social graph extracted from Facebook used in the study
  • Figure 2: The compose post workload flow
  • Figure 3: Peak workload calculation for different Requests-per-Seconds at constant connection values
  • Figure 4: Flamegraph showing stack traces on MongoDB for 1225 RPS load. The regions highlighted in magenta denote network operations
  • Figure 5: Correlation between cycles spent running each tax component
  • ...and 4 more figures