Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study
Ruihang Wang, Zhiwei Cao, Qingang Zhang, Rui Tan, Yonggang Wen, Tommy Leung, Stuart Kennedy, Justin Teoh
TL;DR
The paper tackles the high cooling cost and reliability challenges of data centers in tropical climates by introducing multiphysics-informed learning (MPIML), a framework that blends physics priors with data-driven methods. It presents a three-engine architecture (DCLib, DCTwin, DCBrain) and formalizes multiphysics modeling and optimization problems with P1, P2, and P3, then maps the intelligence evolution into predictive, prescriptive, and adaptive stages. A case study in an industry-grade tropical DC demonstrates that physics-informed surrogates achieve lower prediction errors (≈5% versus 7–9% for purely data-driven approaches) and enable energy-efficient, safety-aware control policies, yielding substantial CO2 and cost savings. The work also discusses geometry adaptability, hybrid multiscale modeling, and uncertainty quantification as critical future directions for robust, scalable deployment. Overall, MPIML offers a practical pathway to safer, greener DC operations with reduced data requirements and improved extrapolation capabilities.
Abstract
Data centers are the backbone of computing capacity. Operating data centers in the tropical regions faces unique challenges due to consistently high ambient temperature and elevated relative humidity throughout the year. These conditions result in increased cooling costs to maintain the reliability of the computing systems. While existing machine learning-based approaches have demonstrated potential to elevate operations to a more proactive and intelligent level, their deployment remains dubious due to concerns about model extrapolation capabilities and associated system safety issues. To address these concerns, this article proposes incorporating the physical characteristics of data centers into traditional data-driven machine learning solutions. We begin by introducing the data center system, including the relevant multiphysics processes and the data-physics availability. Next, we outline the associated modeling and optimization problems and propose an integrated, physics-informed machine learning system to address them. Using the proposed system, we present relevant applications across varying levels of operational intelligence. A case study on an industry-grade tropical data center is provided to demonstrate the effectiveness of our approach. Finally, we discuss key challenges and highlight potential future directions.
