Graph-based Algorithms for Linear Computation Coding

Hans Rosenberger; Ali Bereyhi; Ralf R. Müller

Graph-based Algorithms for Linear Computation Coding

Hans Rosenberger, Ali Bereyhi, Ralf R. Müller

TL;DR

This work addresses efficient computation of $\mathbf{y}=\mathbf{T}\mathbf{x}$ with constant $\mathbf{T}$ by reframing linear computation coding (LCC) as a DAG-based decomposition problem that jointly optimizes operation count and parallelism. It introduces a mixed algorithm (ua) that controls the dag depth via $\Delta\mu_{\max}$ and a depth-based penalty, bridging fully sequential (fs) and fully parallel (fp) approaches. Through hardware-driven cost modeling and extensive simulations, ua—with a focus on maintaining a parallel structure—consistently outperforms fs, fp, and existing baselines in total cost, especially under pipelined implementations. The results demonstrate that DAG-aware LCC is practical for large-scale, real-time linear mappings and has direct relevance to hardware-constrained neural network inference and signal processing tasks.

Abstract

We revisit existing linear computation coding (LCC) algorithms, and introduce a new framework that measures the computational cost of computing multidimensional linear functions, not only in terms of the number of additions, but also with respect to their suitability for parallel processing. Utilizing directed acyclic graphs, which correspond to signal flow graphs in hardware, we propose a novel LCC algorithm that controls the trade-off between the total number of operations and their parallel executability. Numerical evaluations show that the proposed algorithm, constrained to a fully parallel structure, outperforms existing schemes.

Graph-based Algorithms for Linear Computation Coding

TL;DR

This work addresses efficient computation of

with constant

by reframing linear computation coding (LCC) as a DAG-based decomposition problem that jointly optimizes operation count and parallelism. It introduces a mixed algorithm (ua) that controls the dag depth via

and a depth-based penalty, bridging fully sequential (fs) and fully parallel (fp) approaches. Through hardware-driven cost modeling and extensive simulations, ua—with a focus on maintaining a parallel structure—consistently outperforms fs, fp, and existing baselines in total cost, especially under pipelined implementations. The results demonstrate that DAG-aware LCC is practical for large-scale, real-time linear mappings and has direct relevance to hardware-constrained neural network inference and signal processing tasks.

Abstract

Paper Structure (13 sections, 19 equations, 4 figures)

This paper contains 13 sections, 19 equations, 4 figures.

Introduction
Notation
Preliminaries
Addition as a Fundamental Operation
cmvm
Computational Cost
Algorithmic Approaches
fs Algorithm
fp algorithm
ua
Related Algorithms
Numerical Experiments
Conclusion

Figures (4)

Figure 1: A dag realizing the function $y(x_1,x_2)=(21/8)x_2-(5/4)x_1$ is depicted in (a). The same dag is extended in (b) with delay elements to allow for pipelining.
Figure 2: Resulting graph topologies of different algorithmic approaches for decomposing a target matrix ${\hbox{\boldmath$T$}}$ of dimension $6 \times 2$. Green nodes represent input vertices corresponding to elements of the input vector ${\hbox{\boldmath$x$}}$, red nodes represent output vertices of the resulting matrix-vector product ${\hbox{\boldmath$y$}}$ and blue nodes are intermediary vertices of the decomposition graph.
Figure 3: Comparison of different algorithmic approaches for decomposing a $64 \times 4$ target matrix ${\hbox{\boldmath$T$}}$. Solid lines indicate results considering the total cost $C_\mathrm{total}$. Dashed lines only consider the cost of adders $C_\mathrm{add} N_\mathrm{add}$. mcm refers to the algorithm presented in Voronenko_2007 (using the C++ implementation available on Spiral_2007 and extended by our hardware model). The results for each algorithm are averaged over $10^5$ matrix entries.
Figure 4: Comparison of different depth parameters $\Delta\mu_\mathrm{max}$ of the ua given a $16 \times 4$ target matrix ${\hbox{\boldmath$T$}}$. Solid lines indicate results considering the total cost $C_\mathrm{total}$. Dashed lines only consider the cost of adders $C_\mathrm{add} N_\mathrm{add}$. The results for each algorithm are averaged over $10^5$ matrix entries.

Theorems & Definitions (4)

Definition 1: Fundamental Operation
Definition 2: cmvm Problem
Remark 1
Remark 2

Graph-based Algorithms for Linear Computation Coding

TL;DR

Abstract

Graph-based Algorithms for Linear Computation Coding

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (4)