PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU
Neil He, Ming-Cheng Cheng, Yu Liu
TL;DR
PyPOD-GP introduces a GPU-optimized POD-GP library for chip-level thermal simulation, addressing bottlenecks of CPU-based FEM approaches such as FEniCS and PETSc. It formulates temperature as a linear combination of POD modes $T(\\vec{r}, t) = \\sum_{i=1}^d b_i \\eta_i(\\vec{r})$, with reduced coefficients $\\vec{b}$ governed by $C \\dfrac{d\\vec{b}}{dt} + G \\vec{b} = \\vec{P}$, enabling efficientGPU execution. The library provides a three-stage workflow (Data Processing, Model Training, Temperature Inference) and reports substantial speedups (over $23.4\\times$ in training and over $10\\times$ in inference) with device-layer error around $1.2\\%$. By leveraging domain-mapped GPU computations and parallel inference for multiple models, PyPOD-GP enables high-resolution, real-time thermal monitoring for large GPUs and facilitates broader adoption of POD-based methods in physics simulations. Overall, PyPOD-GP demonstrates a practical, scalable route to accurate, fast chip-level thermal analysis on modern many-core architectures.
Abstract
The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a $\textbf{Py}$Torch-based $\textbf{POD-GP}$ library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over $23.4\times$ speedup in training and over $10\times$ speedup in inference on a GPU with over 13,000 cores, with just $1.2\%$ error over the device layer.
