Table of Contents
Fetching ...

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

Amir Ofir, Gil Ben-Artzi

TL;DR

A novel approach for accelerating convolutions during inference for CPU-based architectures that takes advantage of scalar-matrix multiplication and reduces memory overhead is presented.

Abstract

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

TL;DR

A novel approach for accelerating convolutions during inference for CPU-based architectures that takes advantage of scalar-matrix multiplication and reduces memory overhead is presented.

Abstract

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.

Paper Structure

This paper contains 24 sections, 1 equation, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Im2col operation (the arrow on the right) with a $3 \times 3$ kernel on a single input channel image. The product is a matrix of 9 rows and 4 columns. The highlighted slice in the image correspond to the highlighted column. There is a significant overlap between each pair of consecutive columns.
  • Figure 2: Our approach. The result of convolutions of $9$ consecutive positions with a $3 \times 3$ kernel can be viewed as a linear combination of shifted sub-matrices. We extract a sub-matrix of the input tensor and use scalar matrix multiplication with shifted blocks to compute the results.
  • Figure 3: Acceleration of convolutional layers in various neural networks. The x-axis is the depth of the layer and the y-axis is the speedup, normalized to im2col convolution.
  • Figure 4: Acceleration of input channels. The x-axis is the number of input channels and the y-axis is the speedup, normalized to im2col convolution.
  • Figure 5: A comparison of the speedups of different squared input dimensions. The x-axis represents the first dimension of the input, and the y-axis represents the speedup, normalized to im2col convolution.
  • ...and 2 more figures