Table of Contents
Fetching ...

FPGA or GPU? Analyzing comparative research for application-specific guidance

Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal

TL;DR

The paper addresses the challenge of choosing between FPGA and GPU accelerators beyond generic performance metrics by synthesizing a wide range of comparative studies and organizing them by application domain and workload characteristics. It introduces an LLVM-based dependency analysis and application profiling framework to gauge when decouplable variables and loop-carried dependencies, together with memory-access patterns, favor one architecture over the other. Key contributions include a domain-specific categorization of workloads, a structured performance and energy-efficiency comparison, and actionable guidance for accelerator selection. The findings have practical impact for researchers and practitioners aiming to optimize performance, energy efficiency, and programmability, and point to future work in toolchains and hybrid architectures that automate hardware recommendations.

Abstract

The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions, each excelling in specific domains. Although there is substantial research comparing FPGAs and GPUs, most of the work focuses primarily on performance metrics, offering limited insight into the specific types of applications that each accelerator benefits the most. This paper aims to bridge this gap by synthesizing insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications. By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPUs. The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.

FPGA or GPU? Analyzing comparative research for application-specific guidance

TL;DR

The paper addresses the challenge of choosing between FPGA and GPU accelerators beyond generic performance metrics by synthesizing a wide range of comparative studies and organizing them by application domain and workload characteristics. It introduces an LLVM-based dependency analysis and application profiling framework to gauge when decouplable variables and loop-carried dependencies, together with memory-access patterns, favor one architecture over the other. Key contributions include a domain-specific categorization of workloads, a structured performance and energy-efficiency comparison, and actionable guidance for accelerator selection. The findings have practical impact for researchers and practitioners aiming to optimize performance, energy efficiency, and programmability, and point to future work in toolchains and hybrid architectures that automate hardware recommendations.

Abstract

The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions, each excelling in specific domains. Although there is substantial research comparing FPGAs and GPUs, most of the work focuses primarily on performance metrics, offering limited insight into the specific types of applications that each accelerator benefits the most. This paper aims to bridge this gap by synthesizing insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications. By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPUs. The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.

Paper Structure

This paper contains 9 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Memory stalls and bandwidth utilization for Hotspot and BFS applications