Exploring the Feasibility of End-to-End Large Language Model as a Compiler
Hongbin Zhang, Shihao Gao, Yang Liu, Mingjie Xing, Yanjun Wu, Chen Zhao
TL;DR
This work investigates end-to-end LLMs as compilers (LaaC) by introducing the CompilerEval dataset/framework to assess assembly-code generation from source. It demonstrates that while mainstream LLMs can produce executable assembly for simple kernels, overall compilation success remains limited, with gains from prompt engineering, scaling, and reasoning being modest. The authors propose a LaaC framework featuring a knowledge-base-driven, reasoning-enabled pipeline and debugger integration, outlining concrete research directions in model training under compilation constraints, scalable multi-language/multi-platform infrastructure, and improved debugging support. If realized, LaaC could simplify compiler design, reduce development costs, and enable rapid adaptation to new languages and architectures, potentially triggering a paradigm shift in compiler technology.
Abstract
In recent years, end-to-end Large Language Model (LLM) technology has shown substantial advantages across various domains. As critical system software and infrastructure, compilers are responsible for transforming source code into target code. While LLMs have been leveraged to assist in compiler development and maintenance, their potential as an end-to-end compiler remains largely unexplored. This paper explores the feasibility of LLM as a Compiler (LaaC) and its future directions. We designed the CompilerEval dataset and framework specifically to evaluate the capabilities of mainstream LLMs in source code comprehension and assembly code generation. In the evaluation, we analyzed various errors, explored multiple methods to improve LLM-generated code, and evaluated cross-platform compilation capabilities. Experimental results demonstrate that LLMs exhibit basic capabilities as compilers but currently achieve low compilation success rates. By optimizing prompts, scaling up the model, and incorporating reasoning methods, the quality of assembly code generated by LLMs can be significantly enhanced. Based on these findings, we maintain an optimistic outlook for LaaC and propose practical architectural designs and future research directions. We believe that with targeted training, knowledge-rich prompts, and specialized infrastructure, LaaC has the potential to generate high-quality assembly code and drive a paradigm shift in the field of compilation.
