Towards High-Performance Network Coding: FPGA Acceleration With Bounded-value Generators
Jiaxin Qing, Philip H. W. Leong, Kin Hong Lee, Raymond W. Yeung
TL;DR
This paper addresses the practicality of implementing high-throughput network coding with Batched Sparse (BATS) codes on hardware. It introduces CS-BATS, a structured variant that enables efficient hardware mapping, and BV generators that dramatically shrink finite-field multiplier complexity while preserving coding performance. The authors design a scalable FPGA accelerator using BATS Compute Units, matrix tiling, multi-level parallelism, and HBM, achieving up to 27 Gbps throughput and over 300× software speedup, with BV generators reducing multiplier area by up to 70% and providing substantial resource savings. Theoretical and empirical analyses show BV generators incur negligible impact on coding performance when sized appropriately (e.g., $L(2^2)$), and extensive implementation results demonstrate scalable throughput with respect to the number of CUs, port configurations, and HBM settings. Overall, the work demonstrates a viable hardware-software co-design path for practical, high-rate network coding, with meaningful implications for wireless and distributed storage systems.
Abstract
Network coding enhances performance in network communications and distributed storage by increasing throughput and robustness while reducing latency. Batched Sparse (BATS) codes are a class of capacity-achieving network codes, but their practical applications are hindered by their structure, computational intensity, and power demands of finite field operations. Most literature focuses on algorithmic-level techniques to improve coding efficiency. Optimization with an algorithm/hardware co-designing approach has long been neglected. Leveraging the unique structure of BATS codes, we first present CS-BATS, a hardware-friendly variant. Next we propose a simple but effective bounded-value generator, to reduce the size of a finite field multiplier by up to 70%. Finally, we report on a scalable and resource-efficient FPGA-based network coding accelerator that achieves a throughput of 27 Gbps, a speedup of more than 300 over software.
