Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization

SaiKrishna Saketh Yellapragada; Esa Ollila; Mario Costa

Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization

SaiKrishna Saketh Yellapragada, Esa Ollila, Mario Costa

TL;DR

The paper addresses the challenge of deploying DL-based neural receivers at the PHY layer on resource-constrained 6G devices by quantization-aware training (QAT). It extends Post-Training Quantization (PTQ) with a differentiable training pipeline that simulates low-precision arithmetic via fake quantization operators $F_b$, optimizing both network weights $W$ and quantization parameters $\phi_w$ under the STE. Through link-level evaluation on 3GPP CDL-B/D channels with LoS and NLoS and up to $40$ m/s, 4-bit and 8-bit QAT models achieve BLERs close to FP32 at a $10\%$ target and, in NLoS, outperform PTQ while delivering ~8× weight compression. The results demonstrate that QAT enables low-latency, energy-efficient PHY inference suitable for real-time edge processing in 6G, with strong potential for hardware-software co-design and future enhancements like mixed-precision and activation quantization.

Abstract

As wireless communication systems advance toward Sixth Generation (6G) Radio Access Networks (RAN), Deep Learning (DL)-based neural receivers are emerging as transformative solutions for Physical Layer (PHY) processing, delivering superior Block Error Rate (BLER) performance compared to traditional model-based approaches. Practical deployment on resource-constrained hardware, however, requires efficient quantization to reduce latency, energy, and memory without sacrificing reliability. In this paper, we extend Post-Training Quantization (PTQ) by focusing on Quantization-Aware Training (QAT), which incorporates low-precision simulation during training for robustness at ultra-low bitwidths. In particular, we develop a QAT methodology for a neural receiver architecture and benchmark it against a PTQ approach across diverse 3GPP Clustered Delay Line (CDL) channel profiles under both Line-of-Sight (LoS) and Non-LoS (NLoS) conditions, with user velocities up to 40 m/s. Results show that 4-bit and 8-bit QAT models achieve BLERs comparable to FP32 models at a 10% target BLER. Moreover, QAT models succeed in NLoS scenarios where PTQ models fail to reach the 10% BLER target, while also yielding an 8x compression. These results with respect to full-precision demonstrate that QAT is a key enabler of low-complexity and latency-constrained inference at the PHY layer, facilitating real-time processing in 6G edge devices.

Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization

TL;DR

Abstract

Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)