Table of Contents
Fetching ...

LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes

Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor

TL;DR

The paper tackles the data-scarcity challenge for training science-focused LLMs to generate parallel HPC code. It introduces LASSI, an automated self-correcting pipeline that translates between OpenMP and CUDA using bootstrapped LLMs and feedback from compilation and execution. Across ten HeCBench applications and four LLMs, LASSI achieves up to 80% OpenMP→CUDA and 85% CUDA→OpenMP translations producing the expected outputs, with many running within 10% of the target runtimes. The work demonstrates that domain-aware prompting and autonomous self-improvement can yield executable parallel codes, enabling scalable synthetic data generation for training science-oriented LLMs.

Abstract

This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework called LASSI, designed to translate between parallel programming languages by bootstrapping existing closed- or open-source LLMs. LASSI incorporates autonomous enhancement through self-correcting loops where errors encountered during the compilation and execution of generated code are fed back to the LLM through guided prompting for debugging and refactoring. We highlight the bi-directional translation of existing GPU benchmarks between OpenMP target offload and CUDA to validate LASSI. The results of evaluating LASSI with different application codes across four LLMs demonstrate the effectiveness of LASSI for generating executable parallel codes, with 80% of OpenMP to CUDA translations and 85% of CUDA to OpenMP translations producing the expected output. We also observe approximately 78% of OpenMP to CUDA translations and 62% of CUDA to OpenMP translations execute within 10% of or at a faster runtime than the original benchmark code in the same language.

LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes

TL;DR

The paper tackles the data-scarcity challenge for training science-focused LLMs to generate parallel HPC code. It introduces LASSI, an automated self-correcting pipeline that translates between OpenMP and CUDA using bootstrapped LLMs and feedback from compilation and execution. Across ten HeCBench applications and four LLMs, LASSI achieves up to 80% OpenMP→CUDA and 85% CUDA→OpenMP translations producing the expected outputs, with many running within 10% of the target runtimes. The work demonstrates that domain-aware prompting and autonomous self-improvement can yield executable parallel codes, enabling scalable synthetic data generation for training science-oriented LLMs.

Abstract

This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework called LASSI, designed to translate between parallel programming languages by bootstrapping existing closed- or open-source LLMs. LASSI incorporates autonomous enhancement through self-correcting loops where errors encountered during the compilation and execution of generated code are fed back to the LLM through guided prompting for debugging and refactoring. We highlight the bi-directional translation of existing GPU benchmarks between OpenMP target offload and CUDA to validate LASSI. The results of evaluating LASSI with different application codes across four LLMs demonstrate the effectiveness of LASSI for generating executable parallel codes, with 80% of OpenMP to CUDA translations and 85% of CUDA to OpenMP translations producing the expected output. We also observe approximately 78% of OpenMP to CUDA translations and 62% of CUDA to OpenMP translations execute within 10% of or at a faster runtime than the original benchmark code in the same language.
Paper Structure (16 sections, 1 figure, 7 tables)