A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler
Mohammed Tirichine, Nassim Ameur, Nazim Bendib, Iheb Nassim Aouadj, Bouchama Djad, Rafik Bouloudene, Riyadh Baghdadi
TL;DR
The paper addresses automatic optimization of MLIR code by wrapping loop-nest transformations in a reinforcement learning environment. It introduces MLIR RL, featuring a multi-discrete action space and a level-pointers mechanism to manage the large search space of loop interchange, trained with PPO in an actor-critic framework focused on the MLIR Linalg dialect. The authors demonstrate MLIR RL on DL and LQCD workloads, comparing against PyTorch, PyTorch JIT, Halide RL, and Halide autoscheduler, and show favorable results in select domains, along with an ablation study that informs design choices. They also provide a public artifact enabling reproduction and further RL-driven exploration of loop-nest optimization within MLIR, highlighting its potential as a research infrastructure for ML-driven compiler optimization. The work advances automatic code optimization by furnishing a specialized, MLIR-integrated RL environment that can generalize across domains and dialects, potentially reducing manual tuning and unlocking new optimization strategies.
Abstract
Code optimization is a crucial task that aims to enhance code performance. However, this process is often tedious and complex, highlighting the necessity for automatic code optimization techniques. Reinforcement Learning (RL) has emerged as a promising approach for tackling such complex optimization problems. In this project, we introduce MLIR RL, an RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research and enabling automatic code optimization. We propose a multi-discrete formulation of the action space where the action space is the Cartesian product of simpler action subspaces. We also propose a new method, called level pointers, to reduce the size of the action space related to the loop interchange transformation. This enables more efficient and effective learning of the policy. To demonstrate the effectiveness of MLIR RL, we train an RL agent to optimize MLIR Linalg code, targeting CPU. The code is generated from two domain-specific frameworks: deep-learning models generated from PyTorch, and LQCD (Lattice Quantum Chromodynamics) code generated from an LQCD compiler. The result of this work is a research environment that allows the community to experiment with novel ideas in RL-driven loop-nest optimization.
