Table of Contents
Fetching ...

On the Optimality of Single-label and Multi-label Neural Network Decoders

Yunus Can Gültekin, Péter Scheepers, Yuncheng Yuan, Federico Corradi, Alex Alvarado

TL;DR

This work addresses whether SLNN and MLNN decoders can achieve maximum likelihood decoding with simple architectures. It provides constructive proofs (Theorems 1 and 2) of optimal, training-free codebook-based NN designs and validates them on short codes such as the Hamming $(n,k)=(7,4)$, polar $(16,8)$, and BCH $(31,21)$, illustrating codeword-wise and bit-wise ML performance. The key contributions are explicit binary-weight architectures that realize ML decoding with substantially reduced complexity compared to training-based decoders, and a scalability analysis highlighting the curse of dimensionality for longer codes. The results imply that for moderate blocklengths, ML decoding can be achieved with trivial, sparse NNs, guiding decoder design while clarifying practical limits for larger codes.

Abstract

We investigate the design of two neural network (NN) architectures recently proposed as decoders for forward error correction: the so-called single-label NN (SLNN) and multi-label NN (MLNN) decoders. These decoders have been reported to achieve near-optimal codeword- and bit-wise performance, respectively. Results in the literature show near-optimality for a variety of short codes. In this paper, we analytically prove that certain SLNN and MLNN architectures can, in fact, always realize optimal decoding, regardless of the code. These optimal architectures and their binary weights are shown to be defined by the codebook, i.e., no training or network optimization is required. Our proposed architectures are in fact not NNs, but a different way of implementing the maximum likelihood decoding rule. Optimal performance is numerically demonstrated for Hamming $(7,4)$, Polar $(16,8)$, and BCH $(31,21)$ codes. The results show that our optimal architectures are less complex than the SLNN and MLNN architectures proposed in the literature, which in fact only achieve near-optimal performance. Extension to longer codes is still hindered by the curse of dimensionality. Therefore, even though SLNN and MLNN can perform maximum likelihood decoding, such architectures cannot be used for medium and long codes.

On the Optimality of Single-label and Multi-label Neural Network Decoders

TL;DR

This work addresses whether SLNN and MLNN decoders can achieve maximum likelihood decoding with simple architectures. It provides constructive proofs (Theorems 1 and 2) of optimal, training-free codebook-based NN designs and validates them on short codes such as the Hamming , polar , and BCH , illustrating codeword-wise and bit-wise ML performance. The key contributions are explicit binary-weight architectures that realize ML decoding with substantially reduced complexity compared to training-based decoders, and a scalability analysis highlighting the curse of dimensionality for longer codes. The results imply that for moderate blocklengths, ML decoding can be achieved with trivial, sparse NNs, guiding decoder design while clarifying practical limits for larger codes.

Abstract

We investigate the design of two neural network (NN) architectures recently proposed as decoders for forward error correction: the so-called single-label NN (SLNN) and multi-label NN (MLNN) decoders. These decoders have been reported to achieve near-optimal codeword- and bit-wise performance, respectively. Results in the literature show near-optimality for a variety of short codes. In this paper, we analytically prove that certain SLNN and MLNN architectures can, in fact, always realize optimal decoding, regardless of the code. These optimal architectures and their binary weights are shown to be defined by the codebook, i.e., no training or network optimization is required. Our proposed architectures are in fact not NNs, but a different way of implementing the maximum likelihood decoding rule. Optimal performance is numerically demonstrated for Hamming , Polar , and BCH codes. The results show that our optimal architectures are less complex than the SLNN and MLNN architectures proposed in the literature, which in fact only achieve near-optimal performance. Extension to longer codes is still hindered by the curse of dimensionality. Therefore, even though SLNN and MLNN can perform maximum likelihood decoding, such architectures cannot be used for medium and long codes.

Paper Structure

This paper contains 9 sections, 2 theorems, 17 equations, 8 figures, 1 table.

Key Result

Theorem 1

Let $(n,k)$ be a linear block code, with codebook $\mathcal{C}$ containing $2^k$ codewords. Consider a two-layer NN with $n$ input neurons and $2^k$ output neurons. Let the $n$-by-$2^k$ weight matrix $\boldsymbol{W}_1$ connecting the input layer to the output layer be binary, and has its columns equ

Figures (8)

  • Figure 1: Communication system model considered in this work. Three possible encoders are used in combination with 4 different decoders. Maximum likelihood decoding is based on \ref{['eq:mlcodeword']} or \ref{['eq:mapcri']}. The SLNN and MLNN decoders are those proposed in motani2019globecommotani2020iccmotani2020isitmotani2023trans. BDD decoding is based on hard-decisions (HDs) $\hat{\boldsymbol{r}}$ made on $\boldsymbol{r}$.
  • Figure 2: SLNN $7$-$7$-$16$ architecture used in motani2023trans to decode the Hamming $(7,4)$ code and obtain near-optimal performance.
  • Figure 3: FER vs. $E_{\text{b}}/N_0$ for different SLNN decoders for the Hamming $(7,4)$ code. The results for SLNN $7$-$5$-$16$ and SLNN $7$-$7$-$16$ are based on our implementation of the architectures proposed in motani2023trans.
  • Figure 4: SLNN $7$-$0$-$16$ architecture proposed in Theorem \ref{['theorem1']} to realize maximum likelihood decoding. The hidden layer in Fig. \ref{['fig:SLNN_FERvsSNR']} is not present (it is not required for optimal performance). The number of edges here is $56$, while for SLNN $7$-$7$-$16$ in Fig. \ref{['fig:SLNN_FERvsSNR']} the number of edges is $161$.
  • Figure 5: MLNN $7$-$50$-$50$-$4$ architecture used in motani2023trans to decode the Hamming $(7,4)$ code and obtain near-optimal performance.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Example 1: SLNN for Hamming $(7,4)$
  • Theorem 1
  • Example 1: SLNN for Hamming $(7,4)$ continued
  • Example 2: MLNN for Hamming $(7,4)$
  • Theorem 2
  • Example 2: MLNN for Hamming $(7,4)$ continued