Tightening I/O Lower Bounds through the Hourglass Dependency Pattern
Lionel Eyraud-Dubois, Guillaume Iooss, Julien Langou, Fabrice Rastello
TL;DR
The paper targets data movement lower bounds (I/O complexity) for linear algebra kernels by introducing the hourglass dependency pattern. It adapts the $K$-partitioning method to exploit this pattern, yielding tighter, parametric lower bounds for MGS, A2V, V2Q, GEBD2, and GEHD2, and shows tiled upper-bound algorithms that asymptotically match these bounds. The results are automated in the IOLB tool, enabling automatic derivation of I/O bounds for new kernels exhibiting hourglass patterns. This work sharpens the understanding of memory-bound behavior in core linear algebra routines and provides practical guidance for schedulers and tiling strategies to minimize data movement. The findings have broad relevance for performance and energy efficiency in high-performance computing workloads involving QR factorizations and related reductions.
Abstract
When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a given I/O model, scheduling strategies such as loop tiling can reduce the required I/O down to a limit, called the I/O complexity, inherent to the algorithm itself. The objective of I/O complexity analysis is to compute, for a given program, its minimal I/O requirement among all valid schedules. We consider a sequential execution model with two memories, an infinite one, and a small one of size S on which the computations retrieve and produce data. The I/O is the number of reads and writes between the two memories. We identify a common "hourglass pattern" in the dependency graphs of several common linear algebra kernels. Using the properties of this pattern, we mathematically prove tighter lower bounds on their I/O complexity, which improves the previous state-of-the-art bound by a parametric ratio. This proof was integrated inside the IOLB automatic lower bound derivation tool.
