AwareCompiler: Agentic Context-Aware Compiler Optimization via a Synergistic Knowledge-Data Driven Framework
Hongyu Lin, Haolin Pan, Haoran Luo, Yuchen Li, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu
TL;DR
AwareCompiler tackles the challenge of optimizing compiler passes for code size reduction by combining structured knowledge with data-driven learning in an agentic framework. It introduces a knowledge base spanning empirical, symbolic, and negative knowledge, and a context-aware dataset to train an adaptive reasoning agent that generates valid pass sequences under dependency and conflict constraints. The framework employs a two-stage training pipeline (supervised fine-tuning followed by reinforcement learning) with a composite reward that promotes format correctness, answer validity, and actual performance gains, delivering state-of-the-art reductions on diverse benchmarks. The results show improved robustness against semantic misalignment, reduced invalid optimization attempts, and scalable performance across complex codebases, highlighting the practical potential of integrating knowledge with learning for compiler optimization.
Abstract
Compiler optimization is crucial for enhancing program performance by transforming the sequence of optimization passes while maintaining correctness. Despite the promising potential of large language models (LLMs)-based agent for software optimization, automating compiler optimization remains challenging due to: (1) semantic misalignment between abstract program representations and concrete optimization passes, (2) inefficient interaction mechanisms between agents and compiler environments, and (3) reward sparsity from the extensive decision-making process within large optimization spaces. This paper introduces \textbf{AwareCompiler}, an agentic framework for compiler optimization that addresses these challenges through three key innovations: structured knowledge integration and dataset construction, knowledge-driven adaptive pass generation, and data-driven hybrid training pipeline. Experimental results on standard benchmarks demonstrate that AwareCompiler significantly outperforms existing baselines in both performance and efficiency, highlighting the effectiveness of our synergistic knowledge-data-driven approach. Our code is publicly available at https://github.com/LHY-24/AwareCompiler.
