Table of Contents
Fetching ...

Thermal Analysis for NVIDIA GTX480 Fermi GPU Architecture

Savinay Nagendra

TL;DR

This work presents a methodology to analyze the thermal behavior of the NVIDIA GTX480 Fermi GPU by integrating a four-layer (Silicon|TIM|Silicon|TIM) floor plan with CUDA kernels and a power-thermal workflow. Kernels include a linearized matrix-multiplication variant, a cache-tiling version using scratchpad memory, and a Needleman-Wunsch DP kernel; power is modeled in GPGPU-Sim with GPUWatch and converted to ptrace traces for HotSpot to generate heat maps across tensor sizes from $100$ to $800$. The results show that heat increases with tensor size but saturates beyond a threshold, with heat becoming more uniform as more cores participate, while Needleman-Wunsch exhibits higher DRAM heating due to data movement and cache conflicts. The study demonstrates that cache-tiling reduces DRAM traffic and highlights the trade-offs between kernel structure and memory hierarchy, validated by a consistent alignment with theoretical expectations. The approach provides actionable insights for GPU thermal design and performance planning in 3D-IC architectures.

Abstract

In this project, we design a four-layer (Silicon|TIM|Silicon|TIM), 3D floor plan for NVIDIA GTX480 Fermi GPU architecture and compare heat dissipation and power trends for matrix multiplication and Needleman-Wunsch kernels. First, cuda kernels for the two algorithms are written. These kernels are compiled and executed with the GPGPU Simulator to extract power logs for varying tensor sizes. These power logs are converted to ptrace files with an automation script written in Python. The 3D floor plan, along with the generated ptrace files are given to HotSpot, which generates thermal heat maps to show heat dissipation for various components of the Fermi architecture. These heat dissipation trends for both the kernels are observed for multiple tensor sizes to draw qualitative conclusions. The behavioral and execution patterns of both kernels are also observed with these varying heat dissipation trends. With this project, we observe that an increase in tensor size results in an increase of heat dissipation in components of the Fermi Architecture. However, the temperature of the chip remains saturated after a particular tensor size and remains constant thereafter. Heat dissipation is non-uniform with smaller tensor sizes, and becomes more uniform after a certain tensor size. This means, that after a particular tensor size, more cores of the architecture get activated in the computations, thereby resulting in an almost constant temperature. We also observe that Needleman Wunsch uses more data movement between DRAM and caches, thereby showing higher heat dissipation patterns in DRAMs when compared to Matrix multiplication for the same tensor size. Our observations are in accordance with the theoretical concepts behind the working of the two algorithms, thereby making our results consistent.

Thermal Analysis for NVIDIA GTX480 Fermi GPU Architecture

TL;DR

This work presents a methodology to analyze the thermal behavior of the NVIDIA GTX480 Fermi GPU by integrating a four-layer (Silicon|TIM|Silicon|TIM) floor plan with CUDA kernels and a power-thermal workflow. Kernels include a linearized matrix-multiplication variant, a cache-tiling version using scratchpad memory, and a Needleman-Wunsch DP kernel; power is modeled in GPGPU-Sim with GPUWatch and converted to ptrace traces for HotSpot to generate heat maps across tensor sizes from to . The results show that heat increases with tensor size but saturates beyond a threshold, with heat becoming more uniform as more cores participate, while Needleman-Wunsch exhibits higher DRAM heating due to data movement and cache conflicts. The study demonstrates that cache-tiling reduces DRAM traffic and highlights the trade-offs between kernel structure and memory hierarchy, validated by a consistent alignment with theoretical expectations. The approach provides actionable insights for GPU thermal design and performance planning in 3D-IC architectures.

Abstract

In this project, we design a four-layer (Silicon|TIM|Silicon|TIM), 3D floor plan for NVIDIA GTX480 Fermi GPU architecture and compare heat dissipation and power trends for matrix multiplication and Needleman-Wunsch kernels. First, cuda kernels for the two algorithms are written. These kernels are compiled and executed with the GPGPU Simulator to extract power logs for varying tensor sizes. These power logs are converted to ptrace files with an automation script written in Python. The 3D floor plan, along with the generated ptrace files are given to HotSpot, which generates thermal heat maps to show heat dissipation for various components of the Fermi architecture. These heat dissipation trends for both the kernels are observed for multiple tensor sizes to draw qualitative conclusions. The behavioral and execution patterns of both kernels are also observed with these varying heat dissipation trends. With this project, we observe that an increase in tensor size results in an increase of heat dissipation in components of the Fermi Architecture. However, the temperature of the chip remains saturated after a particular tensor size and remains constant thereafter. Heat dissipation is non-uniform with smaller tensor sizes, and becomes more uniform after a certain tensor size. This means, that after a particular tensor size, more cores of the architecture get activated in the computations, thereby resulting in an almost constant temperature. We also observe that Needleman Wunsch uses more data movement between DRAM and caches, thereby showing higher heat dissipation patterns in DRAMs when compared to Matrix multiplication for the same tensor size. Our observations are in accordance with the theoretical concepts behind the working of the two algorithms, thereby making our results consistent.
Paper Structure (20 sections, 25 figures)

This paper contains 20 sections, 25 figures.

Figures (25)

  • Figure 1: GPU Watch Power model integrated with GPGPU-Sim
  • Figure 2: Linearized Matrix Multiplication Kernel matmul1
  • Figure 3: Linearized Matrix Multiplication Kernel optimized with Scratchpad memory.
  • Figure 4: Needleman-Wunsch kernel optimized with Scratchpad memory.
  • Figure 5: Needleman-Wunsch Psuedo Code
  • ...and 20 more figures