Table of Contents
Fetching ...

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sanjali Yadav, Bahar Asgari

TL;DR

The paper tackles the challenge of performance variability in sparse matrix-matrix multiplication (SpGEMM) due to irregular sparsity by proposing ML-based adaptive dataflow selection among inner-product, outer-product, and row-wise schemes. It develops and compares decision-tree and deep reinforcement learning approaches against a heuristic baseline, using a dataset of real and synthetic matrices evaluated with cycle-accurate simulators. Key findings show decision trees achieve about 94% accuracy with substantial average speedups (approximately 2.7x over IP, 2.64x over RW), while a deep Q-network yields around 90% accuracy with comparable gains, though occasionally suboptimal decisions occur; a DT-guided heuristic offers a storage-light alternative. The work demonstrates that ML-driven dataflow selection can significantly improve SpGEMM performance across diverse sparsity patterns and lays out future directions for online RL, state pruning, and broader deployment across SpGEMM hardware accelerators.

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

TL;DR

The paper tackles the challenge of performance variability in sparse matrix-matrix multiplication (SpGEMM) due to irregular sparsity by proposing ML-based adaptive dataflow selection among inner-product, outer-product, and row-wise schemes. It develops and compares decision-tree and deep reinforcement learning approaches against a heuristic baseline, using a dataset of real and synthetic matrices evaluated with cycle-accurate simulators. Key findings show decision trees achieve about 94% accuracy with substantial average speedups (approximately 2.7x over IP, 2.64x over RW), while a deep Q-network yields around 90% accuracy with comparable gains, though occasionally suboptimal decisions occur; a DT-guided heuristic offers a storage-light alternative. The work demonstrates that ML-driven dataflow selection can significantly improve SpGEMM performance across diverse sparsity patterns and lays out future directions for online RL, state pruning, and broader deployment across SpGEMM hardware accelerators.

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.
Paper Structure (6 sections, 6 figures, 2 tables)

This paper contains 6 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Sparsity Analysis -- Illustrates the relationship between sparsity of input matrix A (x-axis), sparsity of input matrix B (size of the bubble), average number of nonzero per-row in matrix A (color depth) and the latency (total number of cycles) of the three dataflow schemes for various SpGEMM experiments.
  • Figure 2: Decision Tree Structure -- Decision-making involves navigating from the root to the relevant leaf, assessing conditions at each node along the path.
  • Figure 3: RL Structure -- At each timestep, the agent perceives the current state and uses its knowledge base to take an action. A reward is assigned to provide feedback.
  • Figure 4: Feature Selection -- Analysis of the feature importance in the decision tree.
  • Figure 5: Performance of Decision Tree Model -- The speedup of the decision tree model over applying IP, OP, RW, and heuristic (H) for SpGEMM operations. Note: The y-axis has been truncated for both the graphs, and the bars are labeled with numbers on top to indicate the actual speedup values.
  • ...and 1 more figures