Table of Contents
Fetching ...

MFIT: Multi-Fidelity Thermal Modeling for 2.5D and 3D Multi-Chiplet Architectures

Lukas Pfromm, Alish Kanani, Harsh Sharma, Parth Solanki, Eric Tervo, Jaehyun Park, Janardhan Rao Doppa, Partha Pratim Pande, Umit Y. Ogras

TL;DR

MFIT introduces a unified, multi-fidelity framework for thermal modeling of 2.5D and 3D chiplet architectures, spanning fine-grained FEM, abstract FEM, thermal RC, and discrete-state-space models. Starting from a high-fidelity FEM reference, MFIT systematically derives faster abstractions (with controlled accuracy losses such as <0.5$^{\circ}$C for abstract FEM and <1.7$^{\circ}$C for RC) to support design space exploration, and finally runtime management via DSS models that run in milliseconds. Comprehensive evaluation on 16, 36, and 64 2.5D chiplets and a 16×3 3D stack demonstrates substantial speedups (days to seconds or milliseconds) with FEM-like accuracy, and the framework is released as open source. The approach enables thermally-aware architectural optimization and runtime cooling decisions for heterogeneous 2.5D/3D packages, including MI300A-scale systems.

Abstract

Rapidly evolving artificial intelligence and machine learning applications require ever-increasing computational capabilities, while monolithic 2D design technologies approach their limits. Heterogeneous integration of smaller chiplets using a 2.5D silicon interposer and 3D packaging has emerged as a promising paradigm to address this limit and meet performance demands. These approaches offer a significant cost reduction and higher manufacturing yield than monolithic 2D integrated circuits. However, the compact arrangement and high compute density exacerbate the thermal management challenges, potentially compromising performance. Addressing these thermal modeling challenges is critical, especially as system sizes grow and different design stages require varying levels of accuracy and speed. Since no single thermal modeling technique meets all these needs, this paper introduces MFIT, a range of multi-fidelity thermal models that effectively balance accuracy and speed. These multi-fidelity models can enable efficient design space exploration and runtime thermal management. Our extensive testing on systems with 16, 36, and 64 2.5D integrated chiplets and 16x3 3D integrated chiplets demonstrates that these models can reduce execution times from days to mere seconds and milliseconds with negligible loss in accuracy.

MFIT: Multi-Fidelity Thermal Modeling for 2.5D and 3D Multi-Chiplet Architectures

TL;DR

MFIT introduces a unified, multi-fidelity framework for thermal modeling of 2.5D and 3D chiplet architectures, spanning fine-grained FEM, abstract FEM, thermal RC, and discrete-state-space models. Starting from a high-fidelity FEM reference, MFIT systematically derives faster abstractions (with controlled accuracy losses such as <0.5C for abstract FEM and <1.7C for RC) to support design space exploration, and finally runtime management via DSS models that run in milliseconds. Comprehensive evaluation on 16, 36, and 64 2.5D chiplets and a 16×3 3D stack demonstrates substantial speedups (days to seconds or milliseconds) with FEM-like accuracy, and the framework is released as open source. The approach enables thermally-aware architectural optimization and runtime cooling decisions for heterogeneous 2.5D/3D packages, including MI300A-scale systems.

Abstract

Rapidly evolving artificial intelligence and machine learning applications require ever-increasing computational capabilities, while monolithic 2D design technologies approach their limits. Heterogeneous integration of smaller chiplets using a 2.5D silicon interposer and 3D packaging has emerged as a promising paradigm to address this limit and meet performance demands. These approaches offer a significant cost reduction and higher manufacturing yield than monolithic 2D integrated circuits. However, the compact arrangement and high compute density exacerbate the thermal management challenges, potentially compromising performance. Addressing these thermal modeling challenges is critical, especially as system sizes grow and different design stages require varying levels of accuracy and speed. Since no single thermal modeling technique meets all these needs, this paper introduces MFIT, a range of multi-fidelity thermal models that effectively balance accuracy and speed. These multi-fidelity models can enable efficient design space exploration and runtime thermal management. Our extensive testing on systems with 16, 36, and 64 2.5D integrated chiplets and 16x3 3D integrated chiplets demonstrates that these models can reduce execution times from days to mere seconds and milliseconds with negligible loss in accuracy.

Paper Structure

This paper contains 24 sections, 15 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: 2.5D/3D integrated chiplet systems considered in this work, showing the chiplets, interposer, and part of the substrate.
  • Figure 2: Summary of the multi-fidelity thermal models. (1) Fine-grained FEM models capture precise geometry but are too complex to simulate the entire chiplet-based system. (2) Abstracted FEM models are derived from the fine-grained model to simulate large-scale systems with negligible impact on accuracy ($<\mathrm{0.5}^{\circ}$C, which is 0.5--1% around the temperatures of interest). (3) Since abstract FEM models are still too slow for DSE, they are used to tune thermal RC circuit models, which introduce less than 1.7$^{\circ}$C (1--3.5% around the temperatures of interest) error). (4) Further abstraction reduces the execution time to milliseconds using DSS models developed for specific system configurations, enabling runtime thermal management.
  • Figure 3: An illustration of the FEM pipeline.
  • Figure 4: The proposed workflow process to produce the multi-fidelity set of thermal models, starting with the most accurate yet slow models and deriving faster models.
  • Figure 5: Cross section of a 2.5D integrated system on Si-interposer, showing abstracted blocks for the $\mu$-bumps, C4 bumps, and link structures.
  • ...and 6 more figures