Table of Contents
Fetching ...

LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set Computation

Patrick Diehl, Noujoud Nader, Deepti Gupta

Abstract

Parallel programming remains one of the most challenging aspects of High-Performance Computing (HPC), requiring deep knowledge of synchronization, communication, and memory models. While modern C++ standards and frameworks like OpenMP and MPI have simplified parallelism, mastering these paradigms is still complex. Recently, Large Language Models (LLMs) have shown promise in automating code generation, but their effectiveness in producing correct and efficient HPC code is not well understood. In this work, we systematically evaluate leading LLMs including ChatGPT 4 and 5, Claude, and LLaMA on the task of generating C++ implementations of the Mandelbrot set using shared-memory, directive-based, and distributed-memory paradigms. Each generated program is compiled and executed with GCC 11.5.0 to assess its correctness, robustness, and scalability. Results show that ChatGPT-4 and ChatGPT-5 achieve strong syntactic precision and scalable performance.

LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set Computation

Abstract

Parallel programming remains one of the most challenging aspects of High-Performance Computing (HPC), requiring deep knowledge of synchronization, communication, and memory models. While modern C++ standards and frameworks like OpenMP and MPI have simplified parallelism, mastering these paradigms is still complex. Recently, Large Language Models (LLMs) have shown promise in automating code generation, but their effectiveness in producing correct and efficient HPC code is not well understood. In this work, we systematically evaluate leading LLMs including ChatGPT 4 and 5, Claude, and LLaMA on the task of generating C++ implementations of the Mandelbrot set using shared-memory, directive-based, and distributed-memory paradigms. Each generated program is compiled and executed with GCC 11.5.0 to assess its correctness, robustness, and scalability. Results show that ChatGPT-4 and ChatGPT-5 achieve strong syntactic precision and scalable performance.

Paper Structure

This paper contains 28 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Time line for the features added to the C++ standard with respect to parallelism. Adapted from 10.1007/978-3-031-32316-4_3.
  • Figure 2: Visualization of the Mandelbrot set with $1000$ iterations, scale of $3$, $c_\textbf{real}=-0.75$, and $c_\text{img}=0.0$: (\ref{['fig:mandelbrot:shared']}) generated by the coroutine code and (\ref{['fig:mandelbrot:dis']})MPI+OpenMP code using ChatGPT 5. It was quite puzzling that the same model ised different color mapping functions for the Mandelbrot set. For a reference for correctness the first published image of the Mandelbrot set in mandelbrot2004fractal can be used.
  • Figure 3: Lines of code (LOC) for (\ref{['fig:code:lines:of:code:coroutine']}) coroutines, (\ref{['fig:code:lines:of:code:async']}) asynchronous programming, (\ref{['fig:code:lines:of:code:par']}) parallel algorithms, and (\ref{['fig:code:lines:of:code:openmp']}) OpenMP; generated by the AI model including comments for all four shared memory examples. We used the tool cloc to obtain the lines of code. Note the the plot of lines of code for coroutines (\ref{['fig:code:lines:of:code:coroutine']}) is different since this was the only generated code where we had to add 37 lines of code to fix the LLaMA generated code. For all other codes the fixed were smaller and mostly existing line of codes had to be changed.
  • Figure 4: Quality from poor to good and the effort from easy to difficult obtained by the COCOMO model using the estimated months. In black ChatGPT 5, in blue ChatGPT 4, in green Claude, and grey Lamma.
  • Figure 5: Quality from poor to good and the effort from easy to difficult obtained by the COCOMO model using the estimated months.
  • ...and 2 more figures