Table of Contents
Fetching ...

A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

Yuan Tang

TL;DR

The paper reexamines the communication bandwidth cost analysis of the recursive TRSM algorithm for solving $L X = B$ with $L \in \mathbb{R}^{n\times n}$ and $B \in \mathbb{R}^{n\times k}$, focusing on the accuracy of the cost categorization. It critiques the CARMA-inspired three-case partitioning of costs, identifying gaps in coverage for regimes with two or three large dimensions and mis-specified bandwidth bounds. The authors propose corrections to the bandwidth scaling in key regimes, such as $\beta$-bounds on the order of $(n^2 + nk) \log p / \sqrt{p}$ and changes arising from $p_r^2 = np/k$, yielding $O((nk^2/p)^{2/3})$ in certain cases. Overall, the work emphasizes the need for alignment between problem degrees of freedom and cost models to ensure accurate, scalable analyses of communication-avoiding parallel TRSM.

Abstract

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, particularly in the categorization of costs into three scenarios based on the relationship between matrix dimensions and processor count. Our findings contribute to the ongoing discourse in the field and pave the way for further improvements in this area of research.

A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

TL;DR

The paper reexamines the communication bandwidth cost analysis of the recursive TRSM algorithm for solving with and , focusing on the accuracy of the cost categorization. It critiques the CARMA-inspired three-case partitioning of costs, identifying gaps in coverage for regimes with two or three large dimensions and mis-specified bandwidth bounds. The authors propose corrections to the bandwidth scaling in key regimes, such as -bounds on the order of and changes arising from , yielding in certain cases. Overall, the work emphasizes the need for alignment between problem degrees of freedom and cost models to ensure accurate, scalable analyses of communication-avoiding parallel TRSM.

Abstract

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, particularly in the categorization of costs into three scenarios based on the relationship between matrix dimensions and processor count. Our findings contribute to the ongoing discourse in the field and pave the way for further improvements in this area of research.
Paper Structure (5 sections, 3 figures)

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: The problematic condition of $n > k\sqrt{p}$ for "two large dimensions" exhibits gaps; The bandwidth bound associated with $\beta$ should be $\frac{(n^2 + nk) \log p}{\sqrt{p}}$.
  • Figure 2: The problematic condition of $k/p < n < k/\sqrt{p}$ for "three large dimensions" exhibits gaps.
  • Figure 3: The incorrect bandwidth bound of "three large dimensions" : with the proposed $p_r^2 = np/k$, the bandwidth bound associated with $\beta$ should be $(\frac{nk^2}{p})^{2/3}$