Table of Contents
Fetching ...

Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays

Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Andy D. Pimentel, Andreas Salzburger, Attila Krasznahorkay

TL;DR

This work tackles the problem of cache-aware data layouts for multi-dimensional arrays by generalizing the Morton order into a large, searchable family of Morton-like layouts. It proposes a combinatorial representation of layouts and an evaluation framework based on cache-simulation to drive a genetic-algorithm search, with the fitness function defined as $F(I; A, H) = \frac{\mathrm{L1}_{hit}(I; A, H) + \mathrm{L1}_{miss}(I; A, H)}{\mathrm{L1}_{lat}(H) \cdot C(I; A, H)}$, correlating layout locality with expected performance. Through experiments on eight access patterns and two processors, the approach finds layouts that outperform canonical ones in several cases, achieving up to tenfold speedups in real hardware for certain patterns and architectures. The work demonstrates that automating layout discovery via evolutionary methods can yield substantial cache-related performance gains with minimal code changes, while acknowledging limitations of the simulation model and the need for broader validation. Future directions include multi-objective optimization across applications, more advanced evolutionary strategies, and extending the method to GPUs and other architectures.

Abstract

The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.

Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays

TL;DR

This work tackles the problem of cache-aware data layouts for multi-dimensional arrays by generalizing the Morton order into a large, searchable family of Morton-like layouts. It proposes a combinatorial representation of layouts and an evaluation framework based on cache-simulation to drive a genetic-algorithm search, with the fitness function defined as , correlating layout locality with expected performance. Through experiments on eight access patterns and two processors, the approach finds layouts that outperform canonical ones in several cases, achieving up to tenfold speedups in real hardware for certain patterns and architectures. The work demonstrates that automating layout discovery via evolutionary methods can yield substantial cache-related performance gains with minimal code changes, while acknowledging limitations of the simulation model and the need for broader validation. Future directions include multi-objective optimization across applications, more advanced evolutionary strategies, and extending the method to GPUs and other architectures.

Abstract

The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.
Paper Structure (21 sections, 8 equations, 6 figures, 6 tables)

This paper contains 21 sections, 8 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Two-dimensional arrays laid out in memory along the gray arrows. An application accesses the array diagonally along the red arrows. Application locality is shown above, memory locality is shown below.
  • Figure 2: All 20 layouts for $8\times 8$ arrays generated by the family of indexing schemes described in \ref{['sec:bijections']}. Note that \ref{['fig:layouts8x8:000111']} corresponds to a row-major layout, while \ref{['fig:layouts8x8:111000']} corresponds to a column-major layout.
  • Figure 3: Throughput of a kernel calculating array indices using canonical layouts as well as Morton-like layouts on the Intel Haswell microarchitecture as given by OSACA.
  • Figure 4: Scatter plot of the fitness and measured running time on an Intel Xeon E5-2660 v3 CPU and AMD EPYC 7413 for randomly chosen array layouts.
  • Figure 5: Distribution of the fitness values for all individuals found across evolution experiments for eight access patterns and two processors.
  • ...and 1 more figures