Table of Contents
Fetching ...

A Survey on Hardware Accelerators for Large Language Models

Christoforos Kachris

TL;DR

The paper surveys hardware accelerators for Transformer-based large language processing, detailing FPGA, CPU/GPU, ASIC, and in-memory approaches. It catalogues architectural strategies, trade-offs, and quantitative performance metrics, highlighting how accelerators reduce compute, memory bandwidth, and energy demands. Key findings show ASIC and in-memory designs achieving the largest speedups and energy efficiency, while FPGAs offer flexible, deployable improvements; software and hybrid approaches on CPUs/GPUs also yield meaningful gains. The results underscore the substantial practical impact of specialized hardware on enabling scalable, energy-efficient LLM deployment in data centers and edge settings.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for natural language processing tasks, revolutionizing the field with their ability to understand and generate human-like text. As the demand for more sophisticated LLMs continues to grow, there is a pressing need to address the computational challenges associated with their scale and complexity. This paper presents a comprehensive survey on hardware accelerators designed to enhance the performance and energy efficiency of Large Language Models. By examining a diverse range of accelerators, including GPUs, FPGAs, and custom-designed architectures, we explore the landscape of hardware solutions tailored to meet the unique computational demands of LLMs. The survey encompasses an in-depth analysis of architecture, performance metrics, and energy efficiency considerations, providing valuable insights for researchers, engineers, and decision-makers aiming to optimize the deployment of LLMs in real-world applications.

A Survey on Hardware Accelerators for Large Language Models

TL;DR

The paper surveys hardware accelerators for Transformer-based large language processing, detailing FPGA, CPU/GPU, ASIC, and in-memory approaches. It catalogues architectural strategies, trade-offs, and quantitative performance metrics, highlighting how accelerators reduce compute, memory bandwidth, and energy demands. Key findings show ASIC and in-memory designs achieving the largest speedups and energy efficiency, while FPGAs offer flexible, deployable improvements; software and hybrid approaches on CPUs/GPUs also yield meaningful gains. The results underscore the substantial practical impact of specialized hardware on enabling scalable, energy-efficient LLM deployment in data centers and edge settings.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for natural language processing tasks, revolutionizing the field with their ability to understand and generate human-like text. As the demand for more sophisticated LLMs continues to grow, there is a pressing need to address the computational challenges associated with their scale and complexity. This paper presents a comprehensive survey on hardware accelerators designed to enhance the performance and energy efficiency of Large Language Models. By examining a diverse range of accelerators, including GPUs, FPGAs, and custom-designed architectures, we explore the landscape of hardware solutions tailored to meet the unique computational demands of LLMs. The survey encompasses an in-depth analysis of architecture, performance metrics, and energy efficiency considerations, providing valuable insights for researchers, engineers, and decision-makers aiming to optimize the deployment of LLMs in real-world applications.
Paper Structure (39 sections, 1 table)