Fusing Gathers with Integer Linear Programming
David van Balen, Gabriele Keller, Ivo Gabede Wolff, Trevor L. McDonell
TL;DR
This paper tackles the challenge of optimally fusing sequences of data-parallel array combinators, including order-changing operations like $gather$, by formulating fusion as an Integer Linear Programming (ILP) problem. It extends prior ILP approaches with formal rules for vertical, horizontal, and diagonal fusion, introduces a cluster notation with traversal-order annotations, and models gather- and backpermute-like behaviors within the ILP constraints. Five cost functions drive the partitioning decisions, enabling flexible optimization goals such as minimizing clusters or memory accesses, and the method is evaluated in the Accelerate framework against greedy baselines on benchmarks like LULESH, FlashAttention, and Multigrid. The results show that ILP can find optimal or near-optimal clusterings within practical solve times for moderate-sized programs, while highlighting the need for improved cost models to reliably predict runtime on specific hardware and opportunities for future work in handling work duplication and richer combinator support.
Abstract
We present an Integer Linear Programming based approach to finding the optimal fusion strategy for combinator-based parallel programs. While combinator-based languages or libraries provide a convenient interface for programming parallel hardware, fusing combinators to more complex operations is essential to achieve the desired performance. Our approach is not only suitable for languages with the usual map, fold, scan, indexing and scatter operations, but also gather operations, which access arrays in arbitrary order, and therefore goes beyond the traditional producer-consumer fusion. It can be parametrised with appropriate cost functions, and is fast enough to be suitable for just-in-time compilation.
