ACPO: AI-Enabled Compiler Framework
Amir H. Ashouri, Muhammad Asif Manzoor, Duc Minh Vu, Raymond Zhang, Colin Toft, Ziwen Wang, Angel Zhang, Bryan Chan, Tomasz S. Czajkowski, Yaoqing Gao
TL;DR
ACPO presents a scalable, modular AI-enabled framework for LLVM that decouples ML from the compiler and provides ready-to-use feature extraction and APIs to plug ML-guided decisions into optimization passes. Through two use cases—Loop Unrolling and Function Inlining—it demonstrates end-to-end training, persistent ML interfaces, and IPC-based inference, achieving average speedups of around 4–4.5% on standard benchmarks. The framework emphasizes reproducibility, generalization, and minimal compilation overhead, while enabling future multi-pass or multi-objective optimization. Overall, ACPO offers a practical pathway to integrate ML-driven heuristics into real-world compiler pipelines with minimal disruption to existing tooling.
Abstract
The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework. This paper presents ACPO: An AI-Enabled Compiler Framework, a novel framework that provides LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate \taco{a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes used in LLVM's O3. and finally, describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that the ACPO model for Loop Unroll can gain on average 4%, 3%, 5.4%, and 0.2% compared to LLVM's vanilla O3 optimization when deployed on Polybench, Coral-2, CoreMark, and Graph-500, respectively. Furthermore, by including both Function Inlining and Loop Unroll models, ACPO can provide a combined speedup of 4.5% on Polybench and 2.4% on Cbench when compared with LLVM's O3, respectively.
