GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

Wulve Yang; Hailong Zou; Rui Zhou; Jionghao Zhang; Qiang Li; Gang Li; Yi Zhan; Shushan Qiao

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

Wulve Yang, Hailong Zou, Rui Zhou, Jionghao Zhang, Qiang Li, Gang Li, Yi Zhan, Shushan Qiao

TL;DR

A geometric-abstraction-based, globally optimal GEMM mapping framework via analytical modeling, which achieves efficient solving while guaranteeing optimality and quickly compute a global-optimal mapping for any GEMM workload, achieving this for the first time in mapping space exploration.

Abstract

General matrix multiplication (GEMM) on spatial accelerators is highly sensitive to mapping choices in both execution efficiency and energy consumption. However, the mapping space exhibits combinatorial explosion, which makes it extremely challenging to obtain optimal mappings within an acceptable time budget. Existing approaches typically face challenges: They often lack global-optimality guarantees and become prohibitively slow as the mapping space grows. To address these limitations, we propose \textsc{GOMA}, a geometric-abstraction-based, globally optimal GEMM mapping framework via analytical modeling, which achieves efficient solving while guaranteeing optimality. \textsc{GOMA} introduces, from first principles, a geometric abstraction for GEMM mapping, yielding an exact analytical energy objective with $O(1)$ evaluation for any given mapping. The objective is highly accurate. \textsc{GOMA} then formulates mapping selection as an integer optimization problem under hardware and mapping constraints, using the analytical energy model as the objective to automate mapping search. \textsc{GOMA} can quickly compute a global-optimal mapping for any (GEMM workload, target hardware) pair, achieving this for the first time in mapping space exploration. Experiments confirm that across representative accelerators and large language model prefill workloads, \textsc{GOMA} improves the energy--delay product (EDP) by $2.24$--$4.24\times$ over SOTA mappers, while accelerating time-to-solution by $3.83$--$73.6\times$.

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

TL;DR

Abstract

evaluation for any given mapping. The objective is highly accurate. \textsc{GOMA} then formulates mapping selection as an integer optimization problem under hardware and mapping constraints, using the analytical energy model as the objective to automate mapping search. \textsc{GOMA} can quickly compute a global-optimal mapping for any (GEMM workload, target hardware) pair, achieving this for the first time in mapping space exploration. Experiments confirm that across representative accelerators and large language model prefill workloads, \textsc{GOMA} improves the energy--delay product (EDP) by

over SOTA mappers, while accelerating time-to-solution by

Paper Structure (69 sections, 37 equations, 9 figures, 3 tables)

This paper contains 69 sections, 37 equations, 9 figures, 3 tables.

Introduction
Related Works
Random Search
Black-box Heuristic Search
Differentiable Model Approximation
Pruned Enumeration
Mathematical Programming
Intuition of GOMA
3D compute grid
Data as Projections: Three Projections and Hierarchical Tiles
Three Projections
Hierarchical Tiles
Parallelism Stacking
How Traversal Determines Reuse: Walking Axis and Projection Update Counting
Bypass and the Reduction Axis: Path Rewriting and the Optimization Objective
...and 54 more sections

Figures (9)

Figure 1: Mapping in a spatial accelerator. A mapping specifies tiling, loop permutation, and level bypass.
Figure 2: Energy variation across different mappings for the same GEMM on a spatial accelerator (log scale). Each point represents a mapping configuration.
Figure 3: Geometric view of GEMM as a 3D compute grid and its three orthogonal projections corresponding to $A(x,z)$, $B(y,z)$, and partial sums/output $P(x,y)$. Mapping executes GEMM by hierarchically tiling the grid across the memory hierarchy.
Figure 4: Walking-axis intuition. When a tile advances along one axis (here $y$), one projection (the $x$--$z$ plane) stays constant and can be reused, while the other two projections update.
Figure 5: Overview of GOMA: geometric abstraction of mapping, closed-form update-count energy model with bypass gating, and global optimization that outputs the optimal tiling/dataflow/bypass together with an optimality certificate.
...and 4 more figures

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

TL;DR

Abstract

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (9)