Performance-Aligned LLMs for Generating Fast Code

Daniel Nichols; Pranav Polasam; Harshitha Menon; Aniruddha Marathe; Todd Gamblin; Abhinav Bhatele

Performance-Aligned LLMs for Generating Fast Code

Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, Abhinav Bhatele

TL;DR

The paper tackles the bottleneck of speed in code produced by large language models by introducing performance-aware fine-tuning. It combines a structured CodeContests-Perf dataset with synthetic data and proposes two methods, reinforcement learning with performance feedback (RLPF) and direct performance alignment (DPA), to align LLM outputs with faster code while preserving correctness. Across code generation and optimization tasks, the approach yields substantial speedups—up to 1.6x on serial code and up to 4.5x on OpenMP—alongside strong correctness metrics, validating the effectiveness of performance-oriented fine-tuning. The work demonstrates a practical path to integrating performance considerations into AI-assisted software development for both serial and parallel HPC workloads.

Abstract

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor performance can originate from disparate sources and be difficult to diagnose. Recent years have seen a multitude of work that use large language models (LLMs) to assist in software development tasks. However, these tools are trained to model the distribution of code as text, and are not specifically designed to understand performance aspects of code. In this work, we introduce a reinforcement learning based methodology to align the outputs of code LLMs with performance. This allows us to build upon the current code modeling capabilities of LLMs and extend them to generate better performing code. We demonstrate that our fine-tuned model improves the expected speedup of generated code over base models for a set of benchmark tasks from 0.9 to 1.6 for serial code and 1.9 to 4.5 for OpenMP code.

Performance-Aligned LLMs for Generating Fast Code

TL;DR

Abstract

Paper Structure (32 sections, 10 equations, 9 figures, 2 tables)

This paper contains 32 sections, 10 equations, 9 figures, 2 tables.

Introduction
Background
Large Language Models for Code
Reinforcement Learning and Proximal Policy Optimization
Overview of Methodology
Data Collection and Labeling
Performance Dataset Collection
Synthetic Data Generation
Aligning LLMs to Generate Faster Code: Proposed Fine-Tuning Approaches
Supervised Learning
Reinforcement Learning with Performance Feedback
Direct Performance Alignment
Evaluation Tasks
Code Generation
Code Optimization
...and 17 more sections

Figures (9)

Figure 1: An overview of the proposed methodology. We first collect a large dataset of fast and slow code pairs using coding contest submissions and synthetically generated data. Then we fine-tune three different LLMs on this data to generate faster code. Finally, we evaluate the fine-tuned models on code generation and optimization tasks.
Figure 2: An overview of the reward model fine-tuning process. The reward model outputs a reward for a fast and slow code sample. The loss function uses these rewards alongside runtime data to update the weights of the model so that its predicted rewards move farther apart for faster and slower code scaled by the runtime speedup.
Figure 3: The RLPF fine-tuning process. A prompt is given to the model and a reward is calculated based on the code it generates. Additionally, the KL-divergence between a reference model and the fine-tuned model is included in the reward to prevent deviating too far from the original distribution. Finally, PPO is used to update the model's parameters based on the reward.
Figure 4: The DPA fine-tuning process. The model being fine-tuned and a reference model are used to generate probabilities for a fast and slow code sample. These probabilities, combined with runtime data, are used to compute a loss and update the model's parameters.
Figure 5: Correctness results for each model on the code generation tasks. Each of the fine-tuned models shows an improvement in correctness over the baseline model with the DS+RLPF model showing the most improvement.
...and 4 more figures

Performance-Aligned LLMs for Generating Fast Code

TL;DR

Abstract

Performance-Aligned LLMs for Generating Fast Code

Authors

TL;DR

Abstract

Table of Contents

Figures (9)