Table of Contents
Fetching ...

AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs

Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung Phan, Le Chen, Mihai Capotă, Theodore Willke, Nesreen K. Ahmed, Ali Jannesari

TL;DR

AutoParLLM introduces a GNN-guided context-generation framework to enhance zero-shot OpenMP code parallelization via large language models. By training GNNs on PerfoGraph representations to predict parallelism and patterns, AutoParLLM produces context-rich prompts that guide LLMs to generate correct and efficient parallel code, quantified by a novel OMPScore metric. Evaluations on NAS and Rodinia show substantial improvements in CodeBERTScore and directive quality, along with notable speedups and enhanced developer productivity; OpenACC extension results demonstrate adaptability to other parallel models. The work advances scaffolding for LLM-assisted HPC code generation by tightly integrating graph-based program analysis with prompt engineering, yielding practical benefits for parallelization tasks.

Abstract

In-Context Learning (ICL) has been shown to be a powerful technique to augment the capabilities of LLMs for a diverse range of tasks. This work proposes \ourtool, a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes. We evaluate \ourtool \xspace{} on $12$ applications from two well-known benchmark suites of parallel codes: NAS Parallel Benchmark and Rodinia Benchmark. Our results show that \ourtool \xspace{} improves the state-of-the-art LLMs (e.g., GPT-4) by 19.9\% in NAS and 6.48\% in Rodinia benchmark in terms of CodeBERTScore for the task of parallel code generation. Moreover, \ourtool \xspace{} improves the ability of the most powerful LLM to date, GPT-4, by achieving $\approx$17\% (on NAS benchmark) and $\approx$16\% (on Rodinia benchmark) better speedup. In addition, we propose \ourscore \xspace{} for evaluating the quality of the parallel code and show its effectiveness in evaluating parallel codes. \ourtool \xspace is available at https://github.com/quazirafi/AutoParLLM.git.

AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs

TL;DR

AutoParLLM introduces a GNN-guided context-generation framework to enhance zero-shot OpenMP code parallelization via large language models. By training GNNs on PerfoGraph representations to predict parallelism and patterns, AutoParLLM produces context-rich prompts that guide LLMs to generate correct and efficient parallel code, quantified by a novel OMPScore metric. Evaluations on NAS and Rodinia show substantial improvements in CodeBERTScore and directive quality, along with notable speedups and enhanced developer productivity; OpenACC extension results demonstrate adaptability to other parallel models. The work advances scaffolding for LLM-assisted HPC code generation by tightly integrating graph-based program analysis with prompt engineering, yielding practical benefits for parallelization tasks.

Abstract

In-Context Learning (ICL) has been shown to be a powerful technique to augment the capabilities of LLMs for a diverse range of tasks. This work proposes \ourtool, a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes. We evaluate \ourtool \xspace{} on applications from two well-known benchmark suites of parallel codes: NAS Parallel Benchmark and Rodinia Benchmark. Our results show that \ourtool \xspace{} improves the state-of-the-art LLMs (e.g., GPT-4) by 19.9\% in NAS and 6.48\% in Rodinia benchmark in terms of CodeBERTScore for the task of parallel code generation. Moreover, \ourtool \xspace{} improves the ability of the most powerful LLM to date, GPT-4, by achieving 17\% (on NAS benchmark) and 16\% (on Rodinia benchmark) better speedup. In addition, we propose \ourscore \xspace{} for evaluating the quality of the parallel code and show its effectiveness in evaluating parallel codes. \ourtool \xspace is available at https://github.com/quazirafi/AutoParLLM.git.
Paper Structure (49 sections, 1 equation, 13 figures, 12 tables)

This paper contains 49 sections, 1 equation, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Effect of AutoParLLM. ALLM = AutoParLLM applied (Green Bars). Average speedup(%) gain of GPT-4 is improved by 17.7% (Intel) & 17.2% (AMD) on NAS and by 16.1% (Intel) & 19.5% (AMD) on Rodinia. LLMs are prompted with few shot settings & speedups are reported using 4 threads. (Comparison with more LLMs in Appendix \ref{['appendix:speeup-all-llms']}.)
  • Figure 2: Overview of the AutoParLLM workflow.
  • Figure 3: Overview of OMPScore.
  • Figure 4: Speedup gain across individual applications in NAS Parallel Benchmark. ALLM-GPT-4 achieves max 24.7% and 28.6% better speedup than GPT-4 for CG in Intel and AMD cpus, respectively.
  • Figure 5: Speedup gain across individual applications in Rodinia-3.1 Benchmark. ALLM-GPT-4 achieves max 40.6% and 30.2% better speedup than GPT-4 for Heartwall in Intel and AMD cpus, respectively.
  • ...and 8 more figures