OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

Tal Kadosh; Niranjan Hasabnis; Prema Soundararajan; Vy A. Vo; Mihai Capota; Nesreen Ahmed; Yuval Pinter; Gal Oren

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

Tal Kadosh, Niranjan Hasabnis, Prema Soundararajan, Vy A. Vo, Mihai Capota, Nesreen Ahmed, Yuval Pinter, Gal Oren

TL;DR

This work argues that general LLMs underperform on HPC-specific tasks and introduces MonoCoder, a domain-specialized HPC code LM of approximately 0.9B parameters trained on the HPCorpus. It pairs MonoCoder with Tokompiler, an AST-based anonymization preprocessor, to assess code-structure understanding and improve OpenMP pragma generation through downstream fine-tuning on HPCorpusOMP. Evaluation across perplexity, CodeBLEU-based code generation, and OpenMP pragma generation demonstrates that the domain-specific model achieves competitive or superior performance to larger general LLMs while offering robustness to semantic anonymization. The approach enables efficient, offline HPC code analysis and automatic parallelization support, with potential for broader HPC-domain adoption through future extensions such as incorporating data-flow and IR representations and expanding to additional HPC tasks.

Abstract

Manual parallelization of code remains a significant challenge due to the complexities of modern software systems and the widespread adoption of multi-core architectures. This paper introduces OMPar, an AI-driven tool designed to automate the parallelization of C/C++ code using OpenMP pragmas. OMPar integrates Large Language Models (LLMs) through two key components: OMPify, which assesses loop parallelization potential, and MonoCoder-OMP, a new fine-tuned model which generates precise OpenMP pragmas. The evaluation of OMPar follows the same rigorous process applied to traditional tools like source-to-source AutoPar and ICPC compilers: (1) ensuring the generated code compiles and runs correctly in serial form, (2) assessing performance with the gradual addition of threads and corresponding physical cores, and (3) verifying and validating the correctness of the code's output. Benchmarks from HeCBench and ParEval are used to evaluate accuracy and performance. Experimental results demonstrate that OMPar significantly outperforms traditional methods, achieving higher accuracy in identifying parallelizable loops and generating efficient pragmas. Beyond accuracy, OMPar offers advantages such as the ability to work on partial or incomplete codebases and the capacity to continuously learn from new code patterns, enhancing its parallelization capabilities over time. These results underscore the potential of LLMs in revolutionizing automatic parallelization techniques, paving the way for more efficient and scalable parallel computing systems.

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

TL;DR

Abstract

Paper Structure (13 sections, 12 figures, 2 tables)

This paper contains 13 sections, 12 figures, 2 tables.

Introduction
Motivations for Building Domain-specific LMs
Characteristics of General vs. HPC Programmers
HPCorpus: HPC Code Corpus
MonoCoder: An HPC-specific code LM
Tokompiler: Preprocessing to Evaluate Code Structure Understanding
MonoCoder Evaluation
Language Modeling Evaluation
Code Generation
Downstream HPC Task: OpenMP Pragma Generation
Related Work
Conclusion
Future Work

Figures (12)

Figure 1: ChatGPT example of OpenMP Parallelization Task: "Can this code be parallelized".
Figure 2: An example showing code completion output of StarCoder. Black text shows the input prompt, blue is the generated completion.
Figure 3: MonoCoder training and validation loss curves.
Figure 4: GPT3, Codex, GPT-3.5&4 and Tokompiler tokenization of the Pi C code. As models are enhanced, tokenization is also improved, but it is still imperfect for domain-specific language primitives and numbers.
Figure 5: Tokompiler pipeline overview: Given a source code, the code turns into a semantic-less version using AST knowledge, and eventually, the lexicalized tokens are fed into MonoCoder.
...and 7 more figures

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

TL;DR

Abstract

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)