PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

Neil He; Ming-Cheng Cheng; Yu Liu

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

Neil He, Ming-Cheng Cheng, Yu Liu

TL;DR

PyPOD-GP introduces a GPU-optimized POD-GP library for chip-level thermal simulation, addressing bottlenecks of CPU-based FEM approaches such as FEniCS and PETSc. It formulates temperature as a linear combination of POD modes $T(\\vec{r}, t) = \\sum_{i=1}^d b_i \\eta_i(\\vec{r})$, with reduced coefficients $\\vec{b}$ governed by $C \\dfrac{d\\vec{b}}{dt} + G \\vec{b} = \\vec{P}$, enabling efficientGPU execution. The library provides a three-stage workflow (Data Processing, Model Training, Temperature Inference) and reports substantial speedups (over $23.4\\times$ in training and over $10\\times$ in inference) with device-layer error around $1.2\\%$. By leveraging domain-mapped GPU computations and parallel inference for multiple models, PyPOD-GP enables high-resolution, real-time thermal monitoring for large GPUs and facilitates broader adoption of POD-based methods in physics simulations. Overall, PyPOD-GP demonstrates a practical, scalable route to accurate, fast chip-level thermal analysis on modern many-core architectures.

Abstract

The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a $\textbf{Py}$Torch-based $\textbf{POD-GP}$ library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over $23.4\times$ speedup in training and over $10\times$ speedup in inference on a GPU with over 13,000 cores, with just $1.2\%$ error over the device layer.

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

TL;DR

, with reduced coefficients

governed by

, enabling efficientGPU execution. The library provides a three-stage workflow (Data Processing, Model Training, Temperature Inference) and reports substantial speedups (over

in training and over

in inference) with device-layer error around

. By leveraging domain-mapped GPU computations and parallel inference for multiple models, PyPOD-GP enables high-resolution, real-time thermal monitoring for large GPUs and facilitates broader adoption of POD-based methods in physics simulations. Overall, PyPOD-GP demonstrates a practical, scalable route to accurate, fast chip-level thermal analysis on modern many-core architectures.

Abstract

Torch-based

library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over

speedup in training and over

speedup in inference on a GPU with over 13,000 cores, with just

error over the device layer.

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

TL;DR

Abstract

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)