Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays
Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Andy D. Pimentel, Andreas Salzburger, Attila Krasznahorkay
TL;DR
This work tackles the problem of cache-aware data layouts for multi-dimensional arrays by generalizing the Morton order into a large, searchable family of Morton-like layouts. It proposes a combinatorial representation of layouts and an evaluation framework based on cache-simulation to drive a genetic-algorithm search, with the fitness function defined as $F(I; A, H) = \frac{\mathrm{L1}_{hit}(I; A, H) + \mathrm{L1}_{miss}(I; A, H)}{\mathrm{L1}_{lat}(H) \cdot C(I; A, H)}$, correlating layout locality with expected performance. Through experiments on eight access patterns and two processors, the approach finds layouts that outperform canonical ones in several cases, achieving up to tenfold speedups in real hardware for certain patterns and architectures. The work demonstrates that automating layout discovery via evolutionary methods can yield substantial cache-related performance gains with minimal code changes, while acknowledging limitations of the simulation model and the need for broader validation. Future directions include multi-objective optimization across applications, more advanced evolutionary strategies, and extending the method to GPUs and other architectures.
Abstract
The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.
