Table of Contents
Fetching ...

Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study

Ruihang Wang, Zhiwei Cao, Qingang Zhang, Rui Tan, Yonggang Wen, Tommy Leung, Stuart Kennedy, Justin Teoh

TL;DR

The paper tackles the high cooling cost and reliability challenges of data centers in tropical climates by introducing multiphysics-informed learning (MPIML), a framework that blends physics priors with data-driven methods. It presents a three-engine architecture (DCLib, DCTwin, DCBrain) and formalizes multiphysics modeling and optimization problems with P1, P2, and P3, then maps the intelligence evolution into predictive, prescriptive, and adaptive stages. A case study in an industry-grade tropical DC demonstrates that physics-informed surrogates achieve lower prediction errors (≈5% versus 7–9% for purely data-driven approaches) and enable energy-efficient, safety-aware control policies, yielding substantial CO2 and cost savings. The work also discusses geometry adaptability, hybrid multiscale modeling, and uncertainty quantification as critical future directions for robust, scalable deployment. Overall, MPIML offers a practical pathway to safer, greener DC operations with reduced data requirements and improved extrapolation capabilities.

Abstract

Data centers are the backbone of computing capacity. Operating data centers in the tropical regions faces unique challenges due to consistently high ambient temperature and elevated relative humidity throughout the year. These conditions result in increased cooling costs to maintain the reliability of the computing systems. While existing machine learning-based approaches have demonstrated potential to elevate operations to a more proactive and intelligent level, their deployment remains dubious due to concerns about model extrapolation capabilities and associated system safety issues. To address these concerns, this article proposes incorporating the physical characteristics of data centers into traditional data-driven machine learning solutions. We begin by introducing the data center system, including the relevant multiphysics processes and the data-physics availability. Next, we outline the associated modeling and optimization problems and propose an integrated, physics-informed machine learning system to address them. Using the proposed system, we present relevant applications across varying levels of operational intelligence. A case study on an industry-grade tropical data center is provided to demonstrate the effectiveness of our approach. Finally, we discuss key challenges and highlight potential future directions.

Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study

TL;DR

The paper tackles the high cooling cost and reliability challenges of data centers in tropical climates by introducing multiphysics-informed learning (MPIML), a framework that blends physics priors with data-driven methods. It presents a three-engine architecture (DCLib, DCTwin, DCBrain) and formalizes multiphysics modeling and optimization problems with P1, P2, and P3, then maps the intelligence evolution into predictive, prescriptive, and adaptive stages. A case study in an industry-grade tropical DC demonstrates that physics-informed surrogates achieve lower prediction errors (≈5% versus 7–9% for purely data-driven approaches) and enable energy-efficient, safety-aware control policies, yielding substantial CO2 and cost savings. The work also discusses geometry adaptability, hybrid multiscale modeling, and uncertainty quantification as critical future directions for robust, scalable deployment. Overall, MPIML offers a practical pathway to safer, greener DC operations with reduced data requirements and improved extrapolation capabilities.

Abstract

Data centers are the backbone of computing capacity. Operating data centers in the tropical regions faces unique challenges due to consistently high ambient temperature and elevated relative humidity throughout the year. These conditions result in increased cooling costs to maintain the reliability of the computing systems. While existing machine learning-based approaches have demonstrated potential to elevate operations to a more proactive and intelligent level, their deployment remains dubious due to concerns about model extrapolation capabilities and associated system safety issues. To address these concerns, this article proposes incorporating the physical characteristics of data centers into traditional data-driven machine learning solutions. We begin by introducing the data center system, including the relevant multiphysics processes and the data-physics availability. Next, we outline the associated modeling and optimization problems and propose an integrated, physics-informed machine learning system to address them. Using the proposed system, we present relevant applications across varying levels of operational intelligence. A case study on an industry-grade tropical data center is provided to demonstrate the effectiveness of our approach. Finally, we discuss key challenges and highlight potential future directions.

Paper Structure

This paper contains 37 sections, 6 equations, 8 figures.

Figures (8)

  • Figure 1: A chronological overview of technical evolution for DC optimizations. The emerging PIML-based approaches that integrate data and physics are promising to achieve safe and efficient DC operations.
  • Figure 2: A typical chilled water-cooled DC consists of three interconnected subsystems, i.e., the computing, cooling, and electrical supply systems. These systems involve different physical processes.
  • Figure 3: The availability of data in different DC stages and the physics completeness forms. The middle areas are typical in a DC where some physics are known but may miss some parameter values or terms.
  • Figure 4: Architecture of the core engines to empower PIML for DC modeling and optimization. The system consists of three engines for DC modeling and optimization, i.e., the DCLib, DCTwin, and DCBrain.
  • Figure 5: MPIML intelligence evolution in three levels. The predictive level intelligence aims to perform accurate and timely predictions. The prescriptive level intelligence aims to facilitate decision-making with the previous developed models. Finally, the policies and models are deployed to physical DC with continuous adaptation.
  • ...and 3 more figures