Table of Contents
Fetching ...

Compilation of Generalized Matrix Chains with Symbolic Sizes

Francisco López, Lars Karlsson, Paolo Bientinesi

TL;DR

This paper tackles the problem of efficiently evaluating Generalized Matrix Chains (GMCs) when matrix sizes are symbolic at compile time. It introduces a multi-versioning code generator that emits a small set of variants and a run-time dispatcher to select the best variant for given sizes, backed by theoretical results guaranteeing a constant-factor bound relative to the optimum. The method also includes an empirical expansion procedure to add variants when needed, balancing code size against performance. Experiments show substantial improvements over single-variant approaches and competitive performance against Armadillo, with overheads kept under $15\%$ in FLOPs for most cases and significant time-speedups when using the expanded variant sets.

Abstract

Generalized Matrix Chains (GMCs) are products of matrices where each matrix carries features (e.g., general, symmetric, triangular, positive-definite) and is optionally transposed and/or inverted. GMCs are commonly evaluated via sequences of calls to BLAS and LAPACK kernels. When matrix sizes are known, one can craft a sequence of kernel calls to evaluate a GMC that minimizes some cost, e.g., the number of floating-point operations (FLOPs). Even in these circumstances, high-level languages and libraries, upon which users usually rely, typically perform a suboptimal mapping of the input GMC onto a sequence of kernels. In this work, we go one step beyond and consider matrix sizes to be symbolic (unknown); this changes the nature of the problem since no single sequence of kernel calls is optimal for all possible combinations of matrix sizes. We design and evaluate a code generator for GMCs with symbolic sizes that relies on multi-versioning. At compile-time, when the GMC is known but the sizes are not, code is generated for a few carefully selected sequences of kernel calls. At run-time, when sizes become known, the best generated variant for the matrix sizes at hand is selected and executed. The code generator uses new theoretical results that guarantee that the cost is within a constant factor from optimal for all matrix sizes and an empirical tuning component that further tightens the gap to optimality in practice. In experiments, we found that the increase above optimal in both FLOPs and execution time of the generated code was less than 15\% for 95\% of the tested chains.

Compilation of Generalized Matrix Chains with Symbolic Sizes

TL;DR

This paper tackles the problem of efficiently evaluating Generalized Matrix Chains (GMCs) when matrix sizes are symbolic at compile time. It introduces a multi-versioning code generator that emits a small set of variants and a run-time dispatcher to select the best variant for given sizes, backed by theoretical results guaranteeing a constant-factor bound relative to the optimum. The method also includes an empirical expansion procedure to add variants when needed, balancing code size against performance. Experiments show substantial improvements over single-variant approaches and competitive performance against Armadillo, with overheads kept under in FLOPs for most cases and significant time-speedups when using the expanded variant sets.

Abstract

Generalized Matrix Chains (GMCs) are products of matrices where each matrix carries features (e.g., general, symmetric, triangular, positive-definite) and is optionally transposed and/or inverted. GMCs are commonly evaluated via sequences of calls to BLAS and LAPACK kernels. When matrix sizes are known, one can craft a sequence of kernel calls to evaluate a GMC that minimizes some cost, e.g., the number of floating-point operations (FLOPs). Even in these circumstances, high-level languages and libraries, upon which users usually rely, typically perform a suboptimal mapping of the input GMC onto a sequence of kernels. In this work, we go one step beyond and consider matrix sizes to be symbolic (unknown); this changes the nature of the problem since no single sequence of kernel calls is optimal for all possible combinations of matrix sizes. We design and evaluate a code generator for GMCs with symbolic sizes that relies on multi-versioning. At compile-time, when the GMC is known but the sizes are not, code is generated for a few carefully selected sequences of kernel calls. At run-time, when sizes become known, the best generated variant for the matrix sizes at hand is selected and executed. The code generator uses new theoretical results that guarantee that the cost is within a constant factor from optimal for all matrix sizes and an empirical tuning component that further tightens the gap to optimality in practice. In experiments, we found that the increase above optimal in both FLOPs and execution time of the generated code was less than 15\% for 95\% of the tested chains.

Paper Structure

This paper contains 19 sections, 4 theorems, 15 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $\boldsymbol{q} = (q_0, q_1, \ldots, q_n)$ be an instance and let $m$ be an index such that $q_m = \min_i q_i$. If $t_{\rm e}$ is a term of the form $\phi_{\textsc{k}_{\rm e}}(q_{j-1}, q_j, q_m)$ or $\phi_{\textsc{k}_{\rm e}}(q_m, q_{j-1}, q_j)$ in the cost function of one variant, and $t_{\rm o

Figures (6)

  • Figure 1: A multi-versioning code generator for compiling generalized matrix chains with symbolic sizes.
  • Figure 2: Grammar for the code generator's input.
  • Figure 3: Mapping from features in the association to kernels for the product of matrices (left) and solving linear systems (right). In both tables, $\mathop{\mathrm{op}}\nolimits(X) = X, X^T$. Symmetric positive-definite matrices are denoted by $P$ on the right table. With a white background, kernels in BLAS. With a gray background, kernels we have defined and implemented.
  • Figure 4: Lookup tables for inference of structure (left) and property (right).
  • Figure 5: Empirical cumulative distribution functions of the ratio over optimum on a per-instance basis, measured on FLOPs, of the base sets $\mathcal{E}_{\rm s}$ (blue solid line), the sets after expanding by one (red dotted line) and two (green dashed line), and the singleton with the left-to-right variant (black dash-dotted line), for $n=5,6,7$. For a given set of variants $\mathcal{S}$, and a given point $x_0$ on the x-axis, the corresponding value $y_0$ on the y-axis indicates the percentage of instances for which the best variant in $\mathcal{S}$ computes at most $x_0$ times more FLOPs than the optimum.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Linear Algebra Mapping Problem
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2