OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

Arijit Bhattacharjee; Ali TehraniJamsaz; Le Chen; Niranjan Hasabnis; Mihai Capota; Nesreen Ahmed; Ali Jannesari

OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

Arijit Bhattacharjee, Ali TehraniJamsaz, Le Chen, Niranjan Hasabnis, Mihai Capota, Nesreen Ahmed, Ali Jannesari

TL;DR

OMPilot targets automatic translation of C++ to OpenMP by introducing a domain-specific encoder–decoder transformer and a suite of pretraining tasks, including Masked Language Modeling, Syntax Structure Annotation, and a Weighted Denoising Auto Encoding loss. It couples these with Back Translation and Progressive Fine-Tuning to improve robustness, and introduces OMPBLEU, a composite metric that captures OpenMP-specific correctness beyond traditional NLP-code metrics. Empirical results show OMPilot outperforms larger baselines in clause generation accuracy and semantic correctness, while delivering up to 28× faster inference by removing natural-language prompts. The work demonstrates strong practical impact for high-performance computing by enabling scalable, verifiable OpenMP code generation and providing a specialized evaluation framework aligned with parallel correctness and compilation viability.

Abstract

Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages. While originally developed for natural language processing, LLMs have shown strong capabilities in modeling programming language syntax and semantics, outperforming traditional rule-based systems in both accuracy and flexibility. These models have streamlined cross-language conversion, reduced development overhead, and accelerated legacy code migration. In this paper, we introduce OMPILOT, a novel domain-specific encoder-decoder transformer tailored for translating C++ code into OpenMP, enabling effective shared-memory parallelization. OMPILOT leverages custom pre-training objectives that incorporate the semantics of parallel constructs and combines both unsupervised and supervised learning strategies to improve code translation robustness. Unlike previous work that focused primarily on loop-level transformations, OMPILOT operates at the function level to capture a wider semantic context. To evaluate our approach, we propose OMPBLEU, a novel composite metric specifically crafted to assess the correctness and quality of OpenMP parallel constructs, addressing limitations in conventional translation metrics.

OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

TL;DR

Abstract

OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)