Offloading Data Center Tax
Akshay Revankar, Charan Renganathan, Sartaj Wariah
TL;DR
The paper tackles the problem of optimizing data center performance by offloading multiple interrelated tax components rather than targeting a single bottleneck. It profiles MongoDB within the DeathStarBench Social Network workload using Perf and flame graphs to map cycles to tax components and analyzes cache/memory behavior, including Cache Allocation Technology (CAT) and Memory Bandwidth Monitoring (MBM). The study reveals that Network, Compression, and Memory Allocation dominate cycle counts, with strong correlations between Memory Allocation and Compression, and suggests that co-offloading these components (potentially via libOS, DPDK, or similar approaches) and resizing cache resources could yield meaningful gains without degrading tail latency. These insights inform hardware-software co-design and resource-allocation strategies for warehouse-scale data centers, guiding where accelerators or software optimizations should be focused for maximum ripple effects across the fleet.
Abstract
The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements across the data center fleet. Typically, performance enhancements in tax components are achieved by offloading them to accelerators, however, it is not practical to offload every tax component. The goal of this paper is to identify opportunities to offload more than one tax component together. We focus on MongoDB which is a common microservice used in a large number of applications in the datacenter. We profile MongoDB running as part of the DeathStarBench benchmark suite, identifying its tax components and their microarchitectural implications. We make observations and suggestions based on the inferences made to offload a few of the tax components in this application.
