Generalization Bounds for Neural Belief Propagation Decoders

Sudarshan Adiga; Xin Xiao; Ravi Tandon; Bane Vasic; Tamal Bose

Generalization Bounds for Neural Belief Propagation Decoders

Sudarshan Adiga, Xin Xiao, Ravi Tandon, Bane Vasic, Tamal Bose

TL;DR

This work tackles the lack of theoretical generalization guarantees for Neural Belief Propagation (NBP) decoders used on LDPC-type codes. It develops a PAC-learning framework based on bit-wise Rademacher complexity and covering numbers to bound the generalization gap in BER as a function of training set size $m$, decoding iterations $T$, and code parameters $(n,k,d_v,d_c)$, yielding an explicit bound that scales with these quantities. The analysis extends to irregular parity-check matrices and incorporates channel SNR considerations via a bound on input LLRs, including a unbounded-LLR treatment. Experimental results on Tanner and QC-LDPC codes corroborate the theory, showing the generalization gap decreases with $m$ and increases with $T$ and blocklength, thereby providing practical guidance for dataset and code design in ML-based decoders.

Abstract

Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms. In this paper, we investigate the generalization capabilities of NBP decoders. Specifically, the generalization gap of a decoder is the difference between empirical and expected bit-error-rate(s). We present new theoretical results which bound this gap and show the dependence on the decoder complexity, in terms of code parameters (blocklength, message length, variable/check node degrees), decoding iterations, and the training dataset size. Results are presented for both regular and irregular parity-check matrices. To the best of our knowledge, this is the first set of theoretical results on generalization performance of neural network based decoders. We present experimental results to show the dependence of generalization gap on the training dataset size, and decoding iterations for different codes.

Generalization Bounds for Neural Belief Propagation Decoders

TL;DR

, decoding iterations

, and code parameters

, yielding an explicit bound that scales with these quantities. The analysis extends to irregular parity-check matrices and incorporates channel SNR considerations via a bound on input LLRs, including a unbounded-LLR treatment. Experimental results on Tanner and QC-LDPC codes corroborate the theory, showing the generalization gap decreases with

and increases with

and blocklength, thereby providing practical guidance for dataset and code design in ML-based decoders.

Abstract

Paper Structure (11 sections, 7 theorems, 72 equations, 7 figures, 1 table)

This paper contains 11 sections, 7 theorems, 72 equations, 7 figures, 1 table.

Introduction
Preliminaries and Problem Statement
Main Results
Experimental Results
Conclusions
Proof of Proposition \ref{['Proposition-1: bit-wise-Rademacher']}
Proof of Theorem \ref{['Theorem-1: dudley-entropy-intergral']}
Lipschitzness in NBP decoders
Bound on covering number of sparse matrices
Proof of Theorem \ref{['Corollary-1: irregular-parity-check-mat']}
Proof of Theorem \ref{['Proposition-2: unbounded-input-channel-snr']}

Key Result

Proposition 1

For any $\delta \in (0,1)$, with probability at least $1-\delta$, the generalization gap for any NBP decoder $f\in \mathcal{F}_T$ can be upper bounded as follows, where $R_m(\mathcal{F}_{T}[j])$ denotes the bit-wise Rademacher complexity for the $j$th output bit.

Figures (7)

Figure 1: (a) End-to-End block diagram for communication using neural belief propagation (NBP) decoders for linear block codes; (b) Architecture of the NBP decoder for $T$ decoding iterations where each decoding iteration corresponds to $2$ hidden layers: (1) variable node layer, (2) parity check node layer.
Figure 2: (a) RHS in Theorem \ref{['Theorem-1: dudley-entropy-intergral']} vs Dataset size ($m$), (b) RHS in Theorem \ref{['Theorem-1: dudley-entropy-intergral']} vs Decoding iterations ($T$), (c) RHS in Theorem \ref{['Theorem-1: dudley-entropy-intergral']} vs Blocklength ($n$), (d) RHS in Theorem \ref{['Theorem-1: dudley-entropy-intergral']} vs Variable node degree ($d_v$).
Figure 3: (a) The total generalization gap from Theorem \ref{['Proposition-2: unbounded-input-channel-snr']}, generalization gap from Theorem \ref{['Theorem-1: dudley-entropy-intergral']}, and the generalization gap due to unbounded log-likelihood ratio as a function of the channel SNR, (b) Selecting the bound on LLR ($b_{\lambda}$) to minimize the generalization gap.
Figure 4: Generalization gap as a function of the dataset size $m$ at channel SNR = $2$ dB for (a) Tanner code with $n = 155$, and $k = 93$, (b) Tanner code with $n = 310$, and $k = 186$.
Figure 5: Generalization gap as a function of the decoding iterations $T$ ($\propto$ number of layers) at channel SNR = $2$ dB for (a) Tanner code with $n = 155$, and $k = 93$, (b) Tanner code with $n = 310$, and $k = 186$.
...and 2 more figures

Theorems & Definitions (18)

Definition 1
Definition 2
Proposition 1
Theorem 1
Remark 1: Representation in Terms of Code-rate and Parity Check Node Degree
Remark 2: Impact of the Code-parameters
Remark 3: Comparison with Other Approaches for Bounding the Generalization Gap
Theorem 2
Theorem 3
Remark 4: Minimizing the generalization gap by selecting the bound on LLR ($b_{\lambda}$) based on Channel SNR
...and 8 more

Generalization Bounds for Neural Belief Propagation Decoders

TL;DR

Abstract

Generalization Bounds for Neural Belief Propagation Decoders

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (18)