Evaluating Rapid Makespan Predictions for Heterogeneous Systems with Programmable Logic
Martin Wilhelm, Franz Freitag, Max Tzschoppe, Thilo Pionteck
TL;DR
This work tackles rapid makespan prediction for task mapping in heterogeneous systems comprising CPUs, GPUs, and FPGAs. It introduces an OpenCL-based evaluation framework that generates large numbers of random, annotated task graphs and corresponding kernels, enabling quick prediction-versus-real-run validation without full hardware-specific implementations. The study analyzes the accuracy and practicality of existing analytical approaches, highlighting challenges from data transfer, streaming, and device congestion, and demonstrates that rapid predictions can effectively guide design-space exploration, even when FPGA bitstream generation remains a bottleneck. The framework is publicly available and aims to bridge theory and practice by enabling developers to refine makespan prediction algorithms for complex, dataflow-capable accelerators.
Abstract
Heterogeneous computing systems, which combine general-purpose processors with specialized accelerators, are increasingly important for optimizing the performance of modern applications. A central challenge is to decide which parts of an application should be executed on which accelerator or, more generally, how to map the tasks of an application to available devices. Predicting the impact of a change in a task mapping on the overall makespan is non-trivial. While there are very capable simulators, these generally require a full implementation of the tasks in question, which is particularly time-intensive for programmable logic. A promising alternative is to use a purely analytical function, which allows for very fast predictions, but abstracts significantly from reality. Bridging the gap between theory and practice poses a significant challenge to algorithm developers. This paper aims to aid in the development of rapid makespan prediction algorithms by providing a highly flexible evaluation framework for heterogeneous systems consisting of CPUs, GPUs and FPGAs, which is capable of collecting real-world makespan results based on abstract task graph descriptions. We analyze to what extent actual makespans can be predicted by existing analytical approaches. Furthermore, we present common challenges that arise from high-level characteristics such as data transfer overhead and device congestion in heterogeneous systems.
