Table of Contents
Fetching ...

Parallel Heuristic Exploration for Additive Complexity Reduction in Fast Matrix Multiplication

A. I. Perminov

TL;DR

The paper tackles reducing additive complexity in fast matrix multiplication for schemes with ternary coefficients by introducing a massively parallel, heuristic-driven CSE framework. Central to the method is the Greedy-Intersections heuristic, which estimates the value of substitutions without full trial evaluations, enabling thousands of concurrent reduction processes on GPU and OpenMP CPU implementations. Across 149 schemes, the approach achieves 102 improvements over the GP baseline, including 57 new state-of-the-art counts for optimal-rank schemes, and delivers substantial speedups, making automated additive-reduction exploration practical. The work also emphasizes component-wise optimization and provides open-source tools for broader adoption and hybrid optimism, marking a significant advance in automated fast-Matrix-Multiplication optimization.

Abstract

This paper presents a parallel random-search method for reducing additive complexity in fast matrix multiplication algorithms with ternary coefficients $\{-1,0,1\}$. The approach replaces expensive exact evaluation with fast heuristic scoring, including the new Greedy-Intersections strategy. The method runs many independent common subexpression elimination processes in parallel, exploring the search space through random pair substitutions and diverse selection strategies while sharing promising partial solutions. Tested on 149 ternary-coefficient schemes, the method achieves lower addition counts than the state-of-the-art Greedy-Potential on 102 schemes (including 57 new best-known results for optimal-rank schemes), matches it on 45, and is outperformed on only 2. For most schemes, it provides equal or better results while being significantly faster, making it practical for algorithm exploration. All software and results are open source.

Parallel Heuristic Exploration for Additive Complexity Reduction in Fast Matrix Multiplication

TL;DR

The paper tackles reducing additive complexity in fast matrix multiplication for schemes with ternary coefficients by introducing a massively parallel, heuristic-driven CSE framework. Central to the method is the Greedy-Intersections heuristic, which estimates the value of substitutions without full trial evaluations, enabling thousands of concurrent reduction processes on GPU and OpenMP CPU implementations. Across 149 schemes, the approach achieves 102 improvements over the GP baseline, including 57 new state-of-the-art counts for optimal-rank schemes, and delivers substantial speedups, making automated additive-reduction exploration practical. The work also emphasizes component-wise optimization and provides open-source tools for broader adoption and hybrid optimism, marking a significant advance in automated fast-Matrix-Multiplication optimization.

Abstract

This paper presents a parallel random-search method for reducing additive complexity in fast matrix multiplication algorithms with ternary coefficients . The approach replaces expensive exact evaluation with fast heuristic scoring, including the new Greedy-Intersections strategy. The method runs many independent common subexpression elimination processes in parallel, exploring the search space through random pair substitutions and diverse selection strategies while sharing promising partial solutions. Tested on 149 ternary-coefficient schemes, the method achieves lower addition counts than the state-of-the-art Greedy-Potential on 102 schemes (including 57 new best-known results for optimal-rank schemes), matches it on 45, and is outperformed on only 2. For most schemes, it provides equal or better results while being significantly faster, making it practical for algorithm exploration. All software and results are open source.

Paper Structure

This paper contains 36 sections, 14 equations, 6 tables, 1 algorithm.