Table of Contents
Fetching ...

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

Massinissa Merouani, Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islem Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi

TL;DR

LOOPer tackles the challenge of selecting profitable affine transformations in polyhedral compilers by deploying a deep-learning cost model to guide a beam-search over an expansive transformation space. It extends prior DL-based cost models to support multiple loop nests and non-rectangular iteration domains, and it integrates a structured, AST-inspired neural architecture with a three-level candidate-generation pipeline. Empirically, LOOPer achieves substantial speedups on PolyBench compared to Pluto and the Tiramisu autoscheduler, and it offers a favorable speed-accuracy trade-off relative to measurement-guided exploration. The work provides large-scale data resources (LOOPerSet) and demonstrates practical portability considerations, highlighting both the promise and the remaining challenges of data-driven autoscheduling in polyhedral compilation.

Abstract

While polyhedral compilers have shown success in implementing advanced code transformations, they still face challenges in selecting the ones that lead to the most profitable speedups. This has motivated the use of machine learning based cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of such an approach. While promising, this approach still faces significant limitations. State-of-the-art polyhedral compilers that use a deep learning cost model only support a small subset of affine transformations, limiting their ability to explore complex code transformations. Furthermore, their applicability does not scale beyond simple programs, thus excluding many program classes from their scope, such as those with non-rectangular iteration domains or multiple loop nests. These limitations significantly impact the generality of such compilers and autoschedulers and put into question the whole approach. In this paper, we introduce LOOPer, the first polyhedral autoscheduler that uses a deep learning based cost model and covers a large space of affine transformations and programs. LOOPer allows the optimization of an extensive set of programs while being effective at applying complex sequences of polyhedral transformations. We implement and evaluate LOOPer and show that it achieves competitive speedups over the state-of-the-art. On the PolyBench benchmarks, LOOPer achieves a geometric mean speedup of 1.84x over Tiramisu and 1.42x over Pluto, two state-of-the-art polyhedral autoschedulers.

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

TL;DR

LOOPer tackles the challenge of selecting profitable affine transformations in polyhedral compilers by deploying a deep-learning cost model to guide a beam-search over an expansive transformation space. It extends prior DL-based cost models to support multiple loop nests and non-rectangular iteration domains, and it integrates a structured, AST-inspired neural architecture with a three-level candidate-generation pipeline. Empirically, LOOPer achieves substantial speedups on PolyBench compared to Pluto and the Tiramisu autoscheduler, and it offers a favorable speed-accuracy trade-off relative to measurement-guided exploration. The work provides large-scale data resources (LOOPerSet) and demonstrates practical portability considerations, highlighting both the promise and the remaining challenges of data-driven autoscheduling in polyhedral compilation.

Abstract

While polyhedral compilers have shown success in implementing advanced code transformations, they still face challenges in selecting the ones that lead to the most profitable speedups. This has motivated the use of machine learning based cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of such an approach. While promising, this approach still faces significant limitations. State-of-the-art polyhedral compilers that use a deep learning cost model only support a small subset of affine transformations, limiting their ability to explore complex code transformations. Furthermore, their applicability does not scale beyond simple programs, thus excluding many program classes from their scope, such as those with non-rectangular iteration domains or multiple loop nests. These limitations significantly impact the generality of such compilers and autoschedulers and put into question the whole approach. In this paper, we introduce LOOPer, the first polyhedral autoscheduler that uses a deep learning based cost model and covers a large space of affine transformations and programs. LOOPer allows the optimization of an extensive set of programs while being effective at applying complex sequences of polyhedral transformations. We implement and evaluate LOOPer and show that it achieves competitive speedups over the state-of-the-art. On the PolyBench benchmarks, LOOPer achieves a geometric mean speedup of 1.84x over Tiramisu and 1.42x over Pluto, two state-of-the-art polyhedral autoschedulers.
Paper Structure (42 sections, 1 equation, 6 figures, 3 tables)

This paper contains 42 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Input representation example.
  • Figure 2: The architecture of LOOPer's neural network. The dim-colored elements are parts of the original Tiramisu cost model. The bright-colored parts represent our contributions to the architecture.
  • Figure 3: Predicted speedups compared to measured speedups
  • Figure 4: Speedups of LOOPer (using the cost model and using the actual measurements) compared to Pluto and Pluto+. The speedups are aggregated by geometric mean over the five sizes of each benchmark. The benchmarks are sorted by descending order of LOOPer's speedups.
  • Figure 5: Speedups of LOOPer compared to the Tiramisu autoscheduler. The speedups are aggregated by geometric mean over the five sizes of each benchmark that the Tiramisu autoscheduler supports.
  • ...and 1 more figures