The Next 700 ML-Enabled Compiler Optimizations

S. VenkataKeerthy; Siddharth Jain; Umesh Kalvakuntla; Pranav Sai Gorantla; Rajiv Shailesh Chitale; Eugene Brevdo; Albert Cohen; Mircea Trofin; Ramakrishna Upadrasta

The Next 700 ML-Enabled Compiler Optimizations

S. VenkataKeerthy, Siddharth Jain, Umesh Kalvakuntla, Pranav Sai Gorantla, Rajiv Shailesh Chitale, Eugene Brevdo, Albert Cohen, Mircea Trofin, Ramakrishna Upadrasta

TL;DR

This work addresses the challenge of integrating ML-based optimization strategies into traditional compilers without locking into a single ML framework. It introduces ML-Compiler-Bridge, a modular library that provides inter- and in-process model runners and multiple SerDes options to enable tight, framework-agnostic coupling between ML models and compilers like LLVM, Pluto, and MLIR. Through four ML-enabled LLVM optimizations (POSET-RL, RL-LoopDistribution, RL4ReAl, and Inliner), the paper demonstrates substantial reductions in deployment latency, training time, and round-trip communication overhead, while offering multi-language APIs and extensible extensions. The results indicate significant practical potential for production-ready ML-driven compiler optimizations, with measurable impact on compile time, binary size, and deployment complexity, and with pathways to broader adoption across domain-specific compilers and Gym-style environments.

Abstract

There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures.

The Next 700 ML-Enabled Compiler Optimizations

TL;DR

Abstract

Paper Structure (48 sections, 6 figures, 5 tables)

This paper contains 48 sections, 6 figures, 5 tables.

Introduction
Background
ML-enabled Compiler Optimizations
Training
Inference
ML-Compiler-Bridge
ML Model Runners
Inter-process Model Runners
gRPC Model Runner
Pipe Model Runner
In-process Model Runners
ONNX Model Runner
ONNXModelRunner for RL
ONNXModelRunner for plain ML models
TensorFlow Model Runners
...and 33 more sections

Figures (6)

Figure 1: ML-enabled compiler optimizations: (1) Inputs and other metadata required by the model are prepared in the appropriate format. (2) Serialized input is passed on to the model by a suitable communication channel. (3) Input is deserialized to appropriate format. (4) The model is queried to obtain optimization decisions as output. (5) Output is serialized, and (6) Sent back to the compiler optimization as a response. (7) The received response is deserialized, and optimization decisions are taken according to the output.
Figure 2: The compiler instantiates a model runner and sets the input features to be used by the model. MLModelRunner internally invokes SerDes to serialize the data in one of the supported formats and query the model. The returned decision is deserialized and provided to the optimization.
Figure 3: Sequence diagram indicating different events and the interaction between various classes for RL based optimization by ONNXModelRunner. Only the methods that highlighted are to be overridden by the user. Other methods are internal to the library.
Figure 4: Class diagram of ML-Compiler-Bridge
Figure 5: Performance characterization of model runners on different compilers and optimizations
...and 1 more figures

The Next 700 ML-Enabled Compiler Optimizations

TL;DR

Abstract

The Next 700 ML-Enabled Compiler Optimizations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)