Table of Contents
Fetching ...

A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support

Yuqing Ren, Leyu Zhang, Ludovic Damien Blanc, Yifei Shen, Xinwei Li, Alexios Balatsoukas-Stimming, Chuan Zhang, Andreas Burg

TL;DR

This work tackles latency and efficiency bottlenecks in node-based SCL polar decoders by introducing a frame-interleaving architecture that time-shares SCU and NPU to decode two frames concurrently. It augments the design with two dynamic stall-reduction strategies (S1 and S2) and an online instruction generator for SR/basic nodes, enabling rate-flexible operation without offline instruction storage. Additionally, graph-ensemble decoding via permuted factor graphs is integrated in Modes II/III to boost error-correcting performance with modest throughput trade-offs. The proposed 28nm FD-SOI ASIC achieves $3.34$ Gbps throughput at $692$ MHz for UL-$(1024,512)$ with an area efficiency of $2.62$ Gbps/mm$^2$, and demonstrates significant improvements over baselines in both throughput and energy efficiency while supporting all 5G NR polar codes. Overall, the framework delivers a flexible, high-throughput, area- and energy-efficient solution for low-latency polar decoding in 5G contexts, including frame and graph interleaving, SR-node processing, and online instruction generation.

Abstract

Node-based successive cancellation list (SCL) decoding has received considerable attention in wireless communications for its significant reduction in decoding latency, particularly with 5G New Radio (NR) polar codes. However, the existing node-based SCL decoders are constrained by sequential processing, leading to complicated and data-dependent computational units that introduce unavoidable stalls, reducing hardware efficiency. In this paper, we present a frame-interleaving hardware architecture for a generalized node-based SCL decoder. By efficiently reusing otherwise idle computational units, two independent frames can be decoded simultaneously, resulting in a significant throughput gain. Based on this new architecture, we further exploit graph ensembles to diversify the decoding space, thus enhancing the error-correcting performance with a limited list size. Two dynamic strategies are proposed to eliminate the residual stalls in the decoding schedule, which eventually results in nearly 2x throughput compared to the state-of-the-art baseline node-based SCL decoder. To impart the decoder rate flexibility, we develop a novel online instruction generator to identify the generalized nodes and produce instructions on-the-fly. The corresponding 28nm FD-SOI ASIC SCL decoder with a list size of 8 has a core area of 1.28 mm2 and operates at 692 MHz. It is compatible with all 5G NR polar codes and achieves a throughput of 3.34 Gbps and an area efficiency of 2.62 Gbps/mm2 for uplink (1024, 512) codes, which is 1.41x and 1.69x better than the state-of-the-art node-based SCL decoders.

A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support

TL;DR

This work tackles latency and efficiency bottlenecks in node-based SCL polar decoders by introducing a frame-interleaving architecture that time-shares SCU and NPU to decode two frames concurrently. It augments the design with two dynamic stall-reduction strategies (S1 and S2) and an online instruction generator for SR/basic nodes, enabling rate-flexible operation without offline instruction storage. Additionally, graph-ensemble decoding via permuted factor graphs is integrated in Modes II/III to boost error-correcting performance with modest throughput trade-offs. The proposed 28nm FD-SOI ASIC achieves Gbps throughput at MHz for UL- with an area efficiency of Gbps/mm, and demonstrates significant improvements over baselines in both throughput and energy efficiency while supporting all 5G NR polar codes. Overall, the framework delivers a flexible, high-throughput, area- and energy-efficient solution for low-latency polar decoding in 5G contexts, including frame and graph interleaving, SR-node processing, and online instruction generation.

Abstract

Node-based successive cancellation list (SCL) decoding has received considerable attention in wireless communications for its significant reduction in decoding latency, particularly with 5G New Radio (NR) polar codes. However, the existing node-based SCL decoders are constrained by sequential processing, leading to complicated and data-dependent computational units that introduce unavoidable stalls, reducing hardware efficiency. In this paper, we present a frame-interleaving hardware architecture for a generalized node-based SCL decoder. By efficiently reusing otherwise idle computational units, two independent frames can be decoded simultaneously, resulting in a significant throughput gain. Based on this new architecture, we further exploit graph ensembles to diversify the decoding space, thus enhancing the error-correcting performance with a limited list size. Two dynamic strategies are proposed to eliminate the residual stalls in the decoding schedule, which eventually results in nearly 2x throughput compared to the state-of-the-art baseline node-based SCL decoder. To impart the decoder rate flexibility, we develop a novel online instruction generator to identify the generalized nodes and produce instructions on-the-fly. The corresponding 28nm FD-SOI ASIC SCL decoder with a list size of 8 has a core area of 1.28 mm2 and operates at 692 MHz. It is compatible with all 5G NR polar codes and achieves a throughput of 3.34 Gbps and an area efficiency of 2.62 Gbps/mm2 for uplink (1024, 512) codes, which is 1.41x and 1.69x better than the state-of-the-art node-based SCL decoders.
Paper Structure (25 sections, 8 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 8 equations, 16 figures, 2 tables, 1 algorithm.

Figures (16)

  • Figure 1: (a) Decoding tree for $N=64$ polar codes with special nodes. (b) Decoding tree after factor graph permutation for $N=64$ polar codes. The node types are for R0, for REP, for R1, for SPC, for TYPE-III, for SR, and for a node of any rate. The superscript $^\prime$ indicates that the vectors $\bm{\lambda}$s and $\bm{\beta}$s are permuted.
  • Figure 2: Top-level overview of the proposed SCL decoder with interleaving architecture, where the red dotted lines show the critical path.
  • Figure 3: Comparison between the baseline decoder Ren22Sequence and our frame-interleaving SCL decoder operating in various modes for DL-$(432,140)$ code using SCL-$8$ decoding with $|\mathbb{P}|=8$.
  • Figure 4: Decoding schedules of two frames using conventional single-frame architecture and our interleaving architecture, where the $i$-th frame is denoted as $\mathsf{F}i$. Operations on different nodes are marked by distinct colors for clarity.
  • Figure 5: Decoding schedules of two graphs on the same frame using conventional single-frame architecture and our interleaving architecture, where the $i$-th graph is denoted as $\mathsf{G}i$.
  • ...and 11 more figures