DecodeX: Exploring and Benchmarking of LDPC Decoding across CPU, GPU, and ASIC Platforms
Zhenzhou Qi, Yuncheng Yao, Yiming Li, Chung-Hsuan Tung, Junyao Zheng, Danyang Zhuo, Tingjun Chen
TL;DR
This work addresses the challenge of efficiently decoding LDPC codes in heterogeneous vRAN environments by introducing DecodeX, a cross-platform benchmarking framework that unifies CPU, GPU, and ASIC LDPC decoding implementations under a common interface. The authors implement and profile four decoding paths—DecodeCPU-FlexRAN, DecodeGPU-Aerial, DecodeASIC-ACC100, and DecodeGPU-SionnaRK—analyzing how threading, memory movements, and offload orchestration impact latency across varying $MCS$, $SNR$, and $PRB$. Key findings show that accelerator gains are strongly influenced by data movement and workload granularity, with ACC100 and GPU-based decoders delivering substantial latency reductions compared to CPU, while inline GPU paths minimize transfer overhead and yield the best end-to-end performance. DecodeX provides actionable insights for cross-platform co-design and dynamic resource management in future NextG vRANs, offering an open-source suite to benchmark, reproduce, and extend across new architectures and configurations.
Abstract
Emerging virtualized radio access networks (vRANs) demand flexible and efficient baseband processing across heterogeneous compute substrates. In this paper, we present DecodeX, a unified benchmarking framework for evaluating low-density parity-check (LDPC) decoding acceleration across different hardware platforms. DecodeX integrates a comprehensive suite of LDPC decoder implementations, including kernels, APIs, and test vectors for CPUs (FlexRAN), GPUs (Aerial and Sionna-RK), and ASIC (ACC100), and can be readily extended to additional architectures and configurations. Using DecodeX, we systematically characterize how different platforms orchestrate computation-from threading and memory management to data movement and accelerator offload-and quantify the resulting decoding latency under varying Physical layer parameters. Our observations reveal distinct trade-offs in parallel efficiency and offload overhead, showing that accelerator gains strongly depend on data-movement and workload granularity. Building on these insights, we discuss how cross-platform benchmarking can inform adaptive scheduling and co-design for future heterogeneous vRANs, enabling scalable and energy-efficient baseband processing for NextG wireless systems.
