Compiler generated feedback for Large Language Models
Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather
TL;DR
This work introduces a compiler-generated feedback loop where Large Language Models optimize LLVM IR by predicting the best optimization passes, target instruction counts, and an optimized IR, followed by compiler-based feedback that validates and refines the predictions. Three feedback forms (Short, Long, Fast) are explored, with a 7B-parameter LLaMa-2-based model trained for 20,000 steps on 64 GPUs. Results show that feedback-enhanced approaches improve over the -Oz baseline in single-shot settings (up to 0.53%), while traditional sampling by the original model can achieve up to 98% of autotuner performance with 100 samples; however, iterative feedback does not consistently beat sampling. The study demonstrates the viability and limitations of integrating LLMs with compiler optimization, highlighting sampling as a particularly potent tool and outlining future directions for smarter feedback and training data derived from feedback-driven prompts.
Abstract
We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs. Then we compile the input with generated optimization passes and evaluate if the predicted instruction count is correct, generated IR is compilable, and corresponds to compiled code. We provide this feedback back to LLM and give it another chance to optimize code. This approach adds an extra 0.53% improvement over -Oz to the original model. Even though, adding more information with feedback seems intuitive, simple sampling techniques achieve much higher performance given 10 or more samples.
