A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Ullah Hafeez, Arif Merchant, Richard McDougall
TL;DR
This paper targets the high storage-related TCO in warehouse-scale computers and acknowledges the practical challenges of deploying monolithic ML models for data placement. It introduces a Bring-Your-Own-Model cross-layer design where workloads train lightweight application-layer models that predict workload importance; a co-designed storage-layer heuristic uses these predictions to drive data placement decisions. Through a production Google prototype and large-scale simulations on production traces, the approach delivers substantial savings, including up to $3.47\times$ TCO improvements and additional $3.22\%$ savings in production contexts, while maintaining low inference latency and interpretability. The work demonstrates the viability, robustness, and generalizability of cross-layer ML for storage systems, and suggests this design philosophy for practical ML deployment in complex infrastructure.
Abstract
Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency. Machine learning (ML)-based methods for solving key problems in storage system efficiency, such as data placement, have shown significant promise. However, there are few known practical deployments of such methods. Studying this problem in the context of real-world hyperscale data centers at Google, we identify a number of challenges that we believe cause this lack of practical adoption. Specifically, prior work assumes a monolithic model that resides entirely within the storage layer, an unrealistic assumption in real-world deployments with frequently changing workloads. To address this problem, we introduce a cross-layer approach where workloads instead ''bring their own model''. This strategy moves ML out of the storage system and instead allows each workload to train its own lightweight model at the application layer, capturing the workload's specific characteristics. These small, interpretable models generate predictions that guide a co-designed scheduling heuristic at the storage layer, enabling adaptation to diverse online environments. We build a proof-of-concept of this approach in a production distributed computation framework at Google. Evaluations in a test deployment and large-scale simulation studies using production traces show improvements of as much as 3.47$\times$ in TCO savings compared to state-of-the-art baselines.
