Table of Contents
Fetching ...

Graph-based Algorithms for Linear Computation Coding

Hans Rosenberger, Ali Bereyhi, Ralf R. Müller

TL;DR

This work addresses efficient computation of $\mathbf{y}=\mathbf{T}\mathbf{x}$ with constant $\mathbf{T}$ by reframing linear computation coding (LCC) as a DAG-based decomposition problem that jointly optimizes operation count and parallelism. It introduces a mixed algorithm (ua) that controls the dag depth via $\Delta\mu_{\max}$ and a depth-based penalty, bridging fully sequential (fs) and fully parallel (fp) approaches. Through hardware-driven cost modeling and extensive simulations, ua—with a focus on maintaining a parallel structure—consistently outperforms fs, fp, and existing baselines in total cost, especially under pipelined implementations. The results demonstrate that DAG-aware LCC is practical for large-scale, real-time linear mappings and has direct relevance to hardware-constrained neural network inference and signal processing tasks.

Abstract

We revisit existing linear computation coding (LCC) algorithms, and introduce a new framework that measures the computational cost of computing multidimensional linear functions, not only in terms of the number of additions, but also with respect to their suitability for parallel processing. Utilizing directed acyclic graphs, which correspond to signal flow graphs in hardware, we propose a novel LCC algorithm that controls the trade-off between the total number of operations and their parallel executability. Numerical evaluations show that the proposed algorithm, constrained to a fully parallel structure, outperforms existing schemes.

Graph-based Algorithms for Linear Computation Coding

TL;DR

This work addresses efficient computation of with constant by reframing linear computation coding (LCC) as a DAG-based decomposition problem that jointly optimizes operation count and parallelism. It introduces a mixed algorithm (ua) that controls the dag depth via and a depth-based penalty, bridging fully sequential (fs) and fully parallel (fp) approaches. Through hardware-driven cost modeling and extensive simulations, ua—with a focus on maintaining a parallel structure—consistently outperforms fs, fp, and existing baselines in total cost, especially under pipelined implementations. The results demonstrate that DAG-aware LCC is practical for large-scale, real-time linear mappings and has direct relevance to hardware-constrained neural network inference and signal processing tasks.

Abstract

We revisit existing linear computation coding (LCC) algorithms, and introduce a new framework that measures the computational cost of computing multidimensional linear functions, not only in terms of the number of additions, but also with respect to their suitability for parallel processing. Utilizing directed acyclic graphs, which correspond to signal flow graphs in hardware, we propose a novel LCC algorithm that controls the trade-off between the total number of operations and their parallel executability. Numerical evaluations show that the proposed algorithm, constrained to a fully parallel structure, outperforms existing schemes.
Paper Structure (13 sections, 19 equations, 4 figures)

This paper contains 13 sections, 19 equations, 4 figures.

Figures (4)

  • Figure 1: A dag realizing the function $y(x_1,x_2)=(21/8)x_2-(5/4)x_1$ is depicted in (a). The same dag is extended in (b) with delay elements to allow for pipelining.
  • Figure 2: Resulting graph topologies of different algorithmic approaches for decomposing a target matrix ${\hbox{\boldmath$T$}}$ of dimension $6 \times 2$. Green nodes represent input vertices corresponding to elements of the input vector ${\hbox{\boldmath$x$}}$, red nodes represent output vertices of the resulting matrix-vector product ${\hbox{\boldmath$y$}}$ and blue nodes are intermediary vertices of the decomposition graph.
  • Figure 3: Comparison of different algorithmic approaches for decomposing a $64 \times 4$ target matrix ${\hbox{\boldmath$T$}}$. Solid lines indicate results considering the total cost $C_\mathrm{total}$. Dashed lines only consider the cost of adders $C_\mathrm{add} N_\mathrm{add}$. mcm refers to the algorithm presented in Voronenko_2007 (using the C++ implementation available on Spiral_2007 and extended by our hardware model). The results for each algorithm are averaged over $10^5$ matrix entries.
  • Figure 4: Comparison of different depth parameters $\Delta\mu_\mathrm{max}$ of the ua given a $16 \times 4$ target matrix ${\hbox{\boldmath$T$}}$. Solid lines indicate results considering the total cost $C_\mathrm{total}$. Dashed lines only consider the cost of adders $C_\mathrm{add} N_\mathrm{add}$. The results for each algorithm are averaged over $10^5$ matrix entries.

Theorems & Definitions (4)

  • Definition 1: Fundamental Operation
  • Definition 2: cmvm Problem
  • Remark 1
  • Remark 2