Table of Contents
Fetching ...

Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks

Tejas Raja

TL;DR

The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption, and provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Abstract

The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficient matrix multiplication is more critical than ever. In this paper, the three main systolic array data flows: Weight Stationary (WS), Input Stationary (IS), and Output Stationary (OS) are discussed. Each data flow's energy consumption and efficiency across various matrix sizes are calculated using the SCALE-Sim simulator. The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption. The conclusions provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks

TL;DR

The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption, and provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Abstract

The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficient matrix multiplication is more critical than ever. In this paper, the three main systolic array data flows: Weight Stationary (WS), Input Stationary (IS), and Output Stationary (OS) are discussed. Each data flow's energy consumption and efficiency across various matrix sizes are calculated using the SCALE-Sim simulator. The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption. The conclusions provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Paper Structure

This paper contains 14 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: ChatGPT Parameters vs. YearChatGPT
  • Figure 2: Von Neumann Diagram arikpo2007neumann
  • Figure 3: Peak Flops ComparisonDomainSpecific
  • Figure 4: Spatial Rows and Columns to Systolic Array
  • Figure 5: Dimensions of Weight, Input, and Output Matrix
  • ...and 1 more figures