Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Junyao Zheng; Chung-Hsuan Tung; Yuncheng Yao; Nishant Mehrotra; Sandesh Mattu; Zhenzhou Qi; Danyang Zhuo; Robert Calderbank; Tingjun Chen

Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Junyao Zheng, Chung-Hsuan Tung, Yuncheng Yao, Nishant Mehrotra, Sandesh Mattu, Zhenzhou Qi, Danyang Zhuo, Robert Calderbank, Tingjun Chen

Abstract

Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms. However, its explicit delay-Doppler (DD) domain representation incurs substantial signal processing complexity, especially with increased DD domain grid sizes. To address this challenge, we present a scalable, real-time Zak-OTFS receiver architecture on GPUs through hardware--algorithm co-design that exploits DD-domain channel sparsity. Our design leverages compact matrix operations for key processing stages, a branchless iterative equalizer, and a structured sparse channel matrix of the DD domain channel matrix to significantly reduce computational and memory overhead. These optimizations enable low-latency processing that consistently meets the 99.9-th percentile real-time processing deadline. The proposed system achieves up to 906.52 Mbps throughput with a DD grid size of (16384,32) using 16QAM modulation over 245.76 MHz bandwidth. Extensive evaluations under a Vehicular-A channel model demonstrate strong scalability and robust performance across CPU (Intel Xeon) and multiple GPU platforms (NVIDIA Jetson Orin, RTX 6000 Ada, A100, and H200), highlighting the effectiveness of compute-aware Zak-OTFS receiver design for next-generation (NextG) high-mobility communication systems.

Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Abstract

Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Abstract

Paper Structure

Table of Contents

Figures (15)