Table of Contents
Fetching ...

Policy-Guided MCTS for near Maximum-Likelihood Decoding of Short Codes

Y. Tian, C. Yue, P. Cheng, G. Pang, B. Vucetic, Y. Li

TL;DR

This work tackles the challenge of achieving near-MLD decoding for short block codes without the Gaussian-elimination overhead of traditional OSD. It introduces a policy-guided Monte Carlo Tree Search that builds and traverses a deterministic TEP tree, with a neural policy trained via MCTS to steer search toward the correct test error patterns. Training uses self-generated supervision from near-MLD outcomes, enabling the policy to generalize across noisy realizations. Experiments on the $(32,16)$ eBCH and $(48,24)$ QR codes show substantial reductions in TEP searches and decoding latency at high SNRs, with near-MLD BLER achieved at practical orders, and the framework generalizes to other OSD variants and stopping rules.

Abstract

In this paper, we propose a policy-guided Monte Carlo Tree Search (MCTS) decoder that achieves near maximum-likelihood decoding (MLD) performance for short block codes. The MCTS decoder searches for test error patterns (TEPs) in the received information bits and obtains codeword candidates through re-encoding. The TEP search is executed on a tree structure, guided by a neural network policy trained via MCTS-based learning. The trained policy guides the decoder to find the correct TEPs with minimal steps from the root node (all-zero TEP). The decoder outputs the codeword with maximum likelihood when the early stopping criterion is satisfied. The proposed method requires no Gaussian elimination (GE) compared to ordered statistics decoding (OSD) and can reduce search complexity by 95\% compared to non-GE OSD. It achieves lower decoding latency than both OSD and non-GE OSD at high SNRs.

Policy-Guided MCTS for near Maximum-Likelihood Decoding of Short Codes

TL;DR

This work tackles the challenge of achieving near-MLD decoding for short block codes without the Gaussian-elimination overhead of traditional OSD. It introduces a policy-guided Monte Carlo Tree Search that builds and traverses a deterministic TEP tree, with a neural policy trained via MCTS to steer search toward the correct test error patterns. Training uses self-generated supervision from near-MLD outcomes, enabling the policy to generalize across noisy realizations. Experiments on the eBCH and QR codes show substantial reductions in TEP searches and decoding latency at high SNRs, with near-MLD BLER achieved at practical orders, and the framework generalizes to other OSD variants and stopping rules.

Abstract

In this paper, we propose a policy-guided Monte Carlo Tree Search (MCTS) decoder that achieves near maximum-likelihood decoding (MLD) performance for short block codes. The MCTS decoder searches for test error patterns (TEPs) in the received information bits and obtains codeword candidates through re-encoding. The TEP search is executed on a tree structure, guided by a neural network policy trained via MCTS-based learning. The trained policy guides the decoder to find the correct TEPs with minimal steps from the root node (all-zero TEP). The decoder outputs the codeword with maximum likelihood when the early stopping criterion is satisfied. The proposed method requires no Gaussian elimination (GE) compared to ordered statistics decoding (OSD) and can reduce search complexity by 95\% compared to non-GE OSD. It achieves lower decoding latency than both OSD and non-GE OSD at high SNRs.

Paper Structure

This paper contains 28 sections, 8 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: The TEP tree for $k=5$ and $m=3$.
  • Figure 2: An MCTS iteration on a TEP tree. MCTS reaches TEP 00100, selects an unexpanded child 01000, then expands, evaluates, and backpropagates.
  • Figure 3: Example of depth-first TEP search. The search first visits the left child via action $a^*$, then backtracks to explore the right child.
  • Figure 4: Comparison of OSD, non-GE OSD and MCTS decoding for (32,16) eBCH code: (a) BLER performance; (b) average searched TEPs under perfect stopping criterion; (c) average searched TEPs under practical stopping criterion; (d) average decoding time under practical stopping criterion.
  • Figure 5:

Theorems & Definitions (3)

  • Definition 1: TEP Tree
  • Remark 1: Target TEP and codeword
  • Remark 2