Table of Contents
Fetching ...

Using MLIR Transform to Design Sliced Convolution Algorithm

Victor Ferrari, Marcio Pereira, Lucas Alvarenga, Gustavo Leite, Guido Araujo

TL;DR

This work introduces SConvTransform, a Transform-dialect-based framework for optimizing 2D convolutions within MLIR by lowering linalg conv ops into tiled, packed code via a declarative pipeline guided by Convolution Slicing Analysis (CSA). It combines edge-case splitting, affine-map-based packing, and two-level tiling to produce a macrokernel that interfaces with architecture-specific microkernels (OpenBLAS SGEMM) while preserving correctness across targets. The approach demonstrates completeness and generability across ConvBench workloads and several architectures, with plausible portability despite not targeting peak performance. Limitations include current reliance on OpenBLAS microkernels and planned future work on automatic padding, full in-MLIR microkernel lowering, and vector-based packing to enhance efficiency and portability.

Abstract

This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions into tiled and packed generic operations through a fully declarative transformation pipeline. The process is guided by a Convolution Slicing Analysis that determines tile sizes and data layout strategies based on input and filter shapes, as well as target architecture parameters. SConvOp handles edge cases by splitting irregular regions and adjusting affine maps where needed. All packing and tiling operations are derived from a parametric set of affine equations, enabling reusable and analyzable transformations. Although functional correctness was the primary goal of this work, the experimental evaluation demonstrates the effectiveness of SConvTransform, achieving good enough performance across different target architectures. Future work will focus on optimizing performance and porting to other target devices. When applied to standard convolution configurations, the generated code achieves up to 60% of peak performance on ARM SME and 67% on Intel AVX512. These results validate the benefit of combining static shape analysis with structured tiling and packing strategies within the MLIR Transform dialect. Furthermore, the modular design of SConvTransform facilitates integration with future extensions, enabling continued optimization of convolution workloads through MLIR's extensible compilation infrastructure.

Using MLIR Transform to Design Sliced Convolution Algorithm

TL;DR

This work introduces SConvTransform, a Transform-dialect-based framework for optimizing 2D convolutions within MLIR by lowering linalg conv ops into tiled, packed code via a declarative pipeline guided by Convolution Slicing Analysis (CSA). It combines edge-case splitting, affine-map-based packing, and two-level tiling to produce a macrokernel that interfaces with architecture-specific microkernels (OpenBLAS SGEMM) while preserving correctness across targets. The approach demonstrates completeness and generability across ConvBench workloads and several architectures, with plausible portability despite not targeting peak performance. Limitations include current reliance on OpenBLAS microkernels and planned future work on automatic padding, full in-MLIR microkernel lowering, and vector-based packing to enhance efficiency and portability.

Abstract

This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions into tiled and packed generic operations through a fully declarative transformation pipeline. The process is guided by a Convolution Slicing Analysis that determines tile sizes and data layout strategies based on input and filter shapes, as well as target architecture parameters. SConvOp handles edge cases by splitting irregular regions and adjusting affine maps where needed. All packing and tiling operations are derived from a parametric set of affine equations, enabling reusable and analyzable transformations. Although functional correctness was the primary goal of this work, the experimental evaluation demonstrates the effectiveness of SConvTransform, achieving good enough performance across different target architectures. Future work will focus on optimizing performance and porting to other target devices. When applied to standard convolution configurations, the generated code achieves up to 60% of peak performance on ARM SME and 67% on Intel AVX512. These results validate the benefit of combining static shape analysis with structured tiling and packing strategies within the MLIR Transform dialect. Furthermore, the modular design of SConvTransform facilitates integration with future extensions, enabling continued optimization of convolution workloads through MLIR's extensible compilation infrastructure.

Paper Structure

This paper contains 31 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Convolution tensor layout
  • Figure 2: Simplified compilation flow for a machine learning compiler using SConv to optimize convolutions.
  • Figure 3: Convolution macro-kernel as generated by the Convolution Slicing Optimization pass of the SConv algorithm.
  • Figure 4: SConvTransform Compilation Flow
  • Figure 5: Performance metrics for Apple M4 processor with SME and Intel i7-11700K processor with AVX512.