From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

Linus Bantel; Moritz Strack; Alexander Strack; Dirk Pflüger

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

Linus Bantel, Moritz Strack, Alexander Strack, Dirk Pflüger

TL;DR

This paper explores how LLMs generate task-based parallel code from three kinds of input prompts: natural language problem descriptions, sequential reference implementations, and parallel pseudo code using three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX.

Abstract

Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts: natural language problem descriptions, sequential reference implementations, and parallel pseudo code. We focus on three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX. Each framework offers different levels of abstraction and control for task execution. We evaluate LLM-generated solutions for correctness and scalability. Our results reveal both strengths and weaknesses of LLMs with regard to problem complexity and framework. Finally, we discuss what these findings mean for future LLM-assisted development in high-performance and scientific computing.

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 4 figures, 2 tables)

This paper contains 12 sections, 4 equations, 4 figures, 2 tables.

Introduction
Related Work
Methodology
Evaluation Metrics
Correctness
Complexity
Scaling
Results
Correctness
Complexity
Scaling
Conclusion and Outlook

Figures (4)

Figure 1: Pass@1 metric for the respective llm without any correction.
Figure 2: Pass@1 metric for the level of fixes, such that the program is correct and runs
Figure 3: Scaling of the generated code grouped by benchmark problem.
Figure 4: Scaling of the generated code grouped by framework.

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

TL;DR

Abstract

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)