Table of Contents
Fetching ...

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

Neil He, Ming-Cheng Cheng, Yu Liu

TL;DR

PyPOD-GP introduces a GPU-optimized POD-GP library for chip-level thermal simulation, addressing bottlenecks of CPU-based FEM approaches such as FEniCS and PETSc. It formulates temperature as a linear combination of POD modes $T(\\vec{r}, t) = \\sum_{i=1}^d b_i \\eta_i(\\vec{r})$, with reduced coefficients $\\vec{b}$ governed by $C \\dfrac{d\\vec{b}}{dt} + G \\vec{b} = \\vec{P}$, enabling efficientGPU execution. The library provides a three-stage workflow (Data Processing, Model Training, Temperature Inference) and reports substantial speedups (over $23.4\\times$ in training and over $10\\times$ in inference) with device-layer error around $1.2\\%$. By leveraging domain-mapped GPU computations and parallel inference for multiple models, PyPOD-GP enables high-resolution, real-time thermal monitoring for large GPUs and facilitates broader adoption of POD-based methods in physics simulations. Overall, PyPOD-GP demonstrates a practical, scalable route to accurate, fast chip-level thermal analysis on modern many-core architectures.

Abstract

The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a $\textbf{Py}$Torch-based $\textbf{POD-GP}$ library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over $23.4\times$ speedup in training and over $10\times$ speedup in inference on a GPU with over 13,000 cores, with just $1.2\%$ error over the device layer.

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

TL;DR

PyPOD-GP introduces a GPU-optimized POD-GP library for chip-level thermal simulation, addressing bottlenecks of CPU-based FEM approaches such as FEniCS and PETSc. It formulates temperature as a linear combination of POD modes , with reduced coefficients governed by , enabling efficientGPU execution. The library provides a three-stage workflow (Data Processing, Model Training, Temperature Inference) and reports substantial speedups (over in training and over in inference) with device-layer error around . By leveraging domain-mapped GPU computations and parallel inference for multiple models, PyPOD-GP enables high-resolution, real-time thermal monitoring for large GPUs and facilitates broader adoption of POD-based methods in physics simulations. Overall, PyPOD-GP demonstrates a practical, scalable route to accurate, fast chip-level thermal analysis on modern many-core architectures.

Abstract

The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a Torch-based library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over speedup in training and over speedup in inference on a GPU with over 13,000 cores, with just error over the device layer.

Paper Structure

This paper contains 10 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Flow Chart of the Pipeline for PyPOD-GP.
  • Figure 2: GPU Floor Plan
  • Figure 3: Error at device layer by number of modes from PyPOD-GP