Table of Contents
Fetching ...

Accelerating Error Correction Code Transformers

Matan Levy, Yoni Choukroun, Lior Wolf

TL;DR

The method brings transformer-based error correction closer to practical implementation in resource-constrained environments, achieving a 90% compression ratio and reducing arithmetic operation energy consumption by at least 224 times on modern hardware.

Abstract

Error correction codes (ECC) are crucial for ensuring reliable information transmission in communication systems. Choukroun & Wolf (2022b) recently introduced the Error Correction Code Transformer (ECCT), which has demonstrated promising performance across various transmission channels and families of codes. However, its high computational and memory demands limit its practical applications compared to traditional decoding algorithms. Achieving effective quantization of the ECCT presents significant challenges due to its inherently small architecture, since existing, very low-precision quantization techniques often lead to performance degradation in compact neural networks. In this paper, we introduce a novel acceleration method for transformer-based decoders. We first propose a ternary weight quantization method specifically designed for the ECCT, inducing a decoder with multiplication-free linear layers. We present an optimized self-attention mechanism to reduce computational complexity via codeaware multi-heads processing. Finally, we provide positional encoding via the Tanner graph eigendecomposition, enabling a richer representation of the graph connectivity. The approach not only matches or surpasses ECCT's performance but also significantly reduces energy consumption, memory footprint, and computational complexity. Our method brings transformer-based error correction closer to practical implementation in resource-constrained environments, achieving a 90% compression ratio and reducing arithmetic operation energy consumption by at least 224 times on modern hardware.

Accelerating Error Correction Code Transformers

TL;DR

The method brings transformer-based error correction closer to practical implementation in resource-constrained environments, achieving a 90% compression ratio and reducing arithmetic operation energy consumption by at least 224 times on modern hardware.

Abstract

Error correction codes (ECC) are crucial for ensuring reliable information transmission in communication systems. Choukroun & Wolf (2022b) recently introduced the Error Correction Code Transformer (ECCT), which has demonstrated promising performance across various transmission channels and families of codes. However, its high computational and memory demands limit its practical applications compared to traditional decoding algorithms. Achieving effective quantization of the ECCT presents significant challenges due to its inherently small architecture, since existing, very low-precision quantization techniques often lead to performance degradation in compact neural networks. In this paper, we introduce a novel acceleration method for transformer-based decoders. We first propose a ternary weight quantization method specifically designed for the ECCT, inducing a decoder with multiplication-free linear layers. We present an optimized self-attention mechanism to reduce computational complexity via codeaware multi-heads processing. Finally, we provide positional encoding via the Tanner graph eigendecomposition, enabling a richer representation of the graph connectivity. The approach not only matches or surpasses ECCT's performance but also significantly reduces energy consumption, memory footprint, and computational complexity. Our method brings transformer-based error correction closer to practical implementation in resource-constrained environments, achieving a 90% compression ratio and reducing arithmetic operation energy consumption by at least 224 times on modern hardware.
Paper Structure (30 sections, 9 equations, 11 figures, 5 tables)

This paper contains 30 sections, 9 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: AAP Linear Layer: (a) QAT: Training with quantization noise; (b) Inference: Matrix multiplication using only integer additions with fixed ternary weights and fixed weight scale.
  • Figure 2: Head Partitioning Self-Attention: (a) First-ring and (b) second-ring head attention mechanisms. $Q$, $K$, $V$ denote query, key, and value tensors for variable (v) or check (c) nodes. $v =$ and $c =$ indicate new representations for variable and check nodes, respectively. $M_{cv}$, $M_{vc}$, $M_{cc}$, $M_{vv}$ are HPSA masks (see Fig. \ref{['fig:AECCT_mask']}). $\sigma$ denotes the Softmax function.
  • Figure 3: Code-aware masks of Hamming(4,7). AECCT masks utilize two distinct patterns, with each head applying only one: either first-ring or second-ring MP. First-ring MP uses c-to-v and v-to-c masks, while second-ring MP employs v-to-v and c-to-c masks. In contrast, the ECCT mask (on the left) applies both first and second rings for all heads. AECCT masks exhibit greater sparsity compared to ECCT, leading to reduced computational complexity.
  • Figure 4: Tanner PE. (a) The SPE matrix is concatenated to the initial nodes' embedding matrix. (b) Creation of the SPE vector for individual node $j$, which is then concatenated with the node’s embedding. $\lambda_i$ denotes the i-th smallest eigenvalue of the Tanner graph. $\phi_i$ denotes the eigenvector corresponding to the i-th smallest eigenvalue and $\phi_{i,j}$ is its j-th element.
  • Figure 5: Comparison of attention sparsity levels for HPSA with $h_f = h_s = 4$. Sparsity level represents the proportion of query-key dot products avoided relative to a full pairwise attention mechanism.
  • ...and 6 more figures