Table of Contents
Fetching ...

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology

Qizhe Wu, Yuchen Gui, Zhichen Zeng, Xiaotian Wang, Huawen Liang, Xi Jin

TL;DR

This work proposes a novel EN-T architecture that can reduce chip area and power consumption, and is compatible with existing tensor processing units and demonstrates an average improvement in area efficiency and energy efficiency.

Abstract

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware for tensor computations. To further improve the computational performance of tensor process units, we have reexamined the issue of computation reuse that was previously overlooked in existing architectures. As a result, we propose a novel EN-T architecture that can reduce chip area and power consumption. Furthermore, our method is compatible with existing tensor processing units. We evaluated our method on prevalent microarchitectures, the results demonstrate an average improvement in area efficiency of 8.7\%, 12.2\%, and 11.0\% for tensor computing units at computational scales of 256 GOPS, 1 TOPS, and 4 TOPS, respectively. Similarly, there were energy efficiency enhancements of 13.0\%, 17.5\%, and 15.5\%.

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology

TL;DR

This work proposes a novel EN-T architecture that can reduce chip area and power consumption, and is compatible with existing tensor processing units and demonstrates an average improvement in area efficiency and energy efficiency.

Abstract

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware for tensor computations. To further improve the computational performance of tensor process units, we have reexamined the issue of computation reuse that was previously overlooked in existing architectures. As a result, we propose a novel EN-T architecture that can reduce chip area and power consumption. Furthermore, our method is compatible with existing tensor processing units. We evaluated our method on prevalent microarchitectures, the results demonstrate an average improvement in area efficiency of 8.7\%, 12.2\%, and 11.0\% for tensor computing units at computational scales of 256 GOPS, 1 TOPS, and 4 TOPS, respectively. Similarly, there were energy efficiency enhancements of 13.0\%, 17.5\%, and 15.5\%.
Paper Structure (13 sections, 17 equations, 12 figures, 2 tables)

This paper contains 13 sections, 17 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: (a) Energy efficiency and area efficiency of mainstream 7nm AI accelerators. Area (b) and power (c) breakdown of TPU die.
  • Figure 2: Mainstream microarchitectures of Tensor Computing Units in recent years.
  • Figure 3: (a) The internal computational abstraction of PE. (b) From the perspective of TCU. (c) The proposed architecture.
  • Figure 4: Modified Booth multiplier.
  • Figure 5: Modified encoder logic.
  • ...and 7 more figures