Table of Contents
Fetching ...

MMGaP: Multi-User MIMO Detection and Precoding using GPU-assisted Physics-inspired Computation

Abhishek Kumar Singh, Kyle Jamieson

TL;DR

MMGaP tackles the gap between theory and practice for physics-inspired MIMO processing in 5G by implementing a GPU-based CIM-driven MU-MIMO detector and downlink Vector Perturbation precoding. It maps MIMO detection and precoding to Ising optimization, executing multiple anneals on bare-metal CUDA kernels packaged as TensorFlow ops and integrated with NVIDIA Aerial CUDA to achieve line-rate performance. The approach demonstrates substantial uplink and downlink throughput gains over traditional linear baselines (e.g., ~50 Mbps per UE uplink, ~100 Mbps per UE downlink for 8×8 at 100 MHz) and scales to larger MIMO sizes (e.g., 16×16), with detailed microbenchmarks on A100/H100/L4 GPUs. The results indicate that physics-inspired MIMO processing on commodity GPUs is feasible for real-world 5G deployments and can be integrated into existing GPU-accelerated stacks.

Abstract

Physics-inspired and quantum compute based methods for processing in the physical layer of next-generation cellular radio access networks have demonstrated theoretical advances in spectral efficiency in recent years, but have stopped short of practical realization on commodity processors, leaving a gap between the throughput practical systems can achieve and the projected throughput the state-of-the-art should achieve. To fill this gap, this paper proposes MMGaP, an uplink multi-user MIMO detector and downlink Vector perturbation precoder for next-generation cellular networks. MMGaP realizes these large MIMO processing algorithms for the first time on bare-metal CUDA kernels that scale to run on large GPU processing platforms, and can be packaged as TensorFlow modules, allowing easy integration with a variety of systems. We integrate MMGaP with NVIDIA's software-defined, GPU-accelerated 5G platform and evaluate its performance against the state-of-the-art. In a 5G cellular network using 100 MHz of radio bandwidth, eight antennas at the base station and eight concurrent users, we show that MMGaP improves uplink throughput by approximately 50 Mbps per user and downlink throughput by 100 Mbps per user over a wide range of SNR. We further show that MMGaP can also support larger MIMO sizes: for 16 antennas at the base station and 16 concurrent users, MMGaP provides more than 50 Mbps higher uplink throughput per user. We measure the execution time of MMGaP on different NVIDIA GPUs and show that it can operate at line-rate and meet the timing requirements of state-of-the-art 5G systems.

MMGaP: Multi-User MIMO Detection and Precoding using GPU-assisted Physics-inspired Computation

TL;DR

MMGaP tackles the gap between theory and practice for physics-inspired MIMO processing in 5G by implementing a GPU-based CIM-driven MU-MIMO detector and downlink Vector Perturbation precoding. It maps MIMO detection and precoding to Ising optimization, executing multiple anneals on bare-metal CUDA kernels packaged as TensorFlow ops and integrated with NVIDIA Aerial CUDA to achieve line-rate performance. The approach demonstrates substantial uplink and downlink throughput gains over traditional linear baselines (e.g., ~50 Mbps per UE uplink, ~100 Mbps per UE downlink for 8×8 at 100 MHz) and scales to larger MIMO sizes (e.g., 16×16), with detailed microbenchmarks on A100/H100/L4 GPUs. The results indicate that physics-inspired MIMO processing on commodity GPUs is feasible for real-world 5G deployments and can be integrated into existing GPU-accelerated stacks.

Abstract

Physics-inspired and quantum compute based methods for processing in the physical layer of next-generation cellular radio access networks have demonstrated theoretical advances in spectral efficiency in recent years, but have stopped short of practical realization on commodity processors, leaving a gap between the throughput practical systems can achieve and the projected throughput the state-of-the-art should achieve. To fill this gap, this paper proposes MMGaP, an uplink multi-user MIMO detector and downlink Vector perturbation precoder for next-generation cellular networks. MMGaP realizes these large MIMO processing algorithms for the first time on bare-metal CUDA kernels that scale to run on large GPU processing platforms, and can be packaged as TensorFlow modules, allowing easy integration with a variety of systems. We integrate MMGaP with NVIDIA's software-defined, GPU-accelerated 5G platform and evaluate its performance against the state-of-the-art. In a 5G cellular network using 100 MHz of radio bandwidth, eight antennas at the base station and eight concurrent users, we show that MMGaP improves uplink throughput by approximately 50 Mbps per user and downlink throughput by 100 Mbps per user over a wide range of SNR. We further show that MMGaP can also support larger MIMO sizes: for 16 antennas at the base station and 16 concurrent users, MMGaP provides more than 50 Mbps higher uplink throughput per user. We measure the execution time of MMGaP on different NVIDIA GPUs and show that it can operate at line-rate and meet the timing requirements of state-of-the-art 5G systems.

Paper Structure

This paper contains 24 sections, 15 equations, 16 figures.

Figures (16)

  • Figure 1: High Level Design (Uplink): Our proposed system MMGaP is part of the Physical layer data plane (to perform MIMO detection) and MAC Layer control plane (for selecting best parameters, modulation, and coding schemes) in the 5G system.
  • Figure 2: Coherent Ising Machine-based MIMO detection: The MIMO detection problem for the Uplink MU-MIMO system is converted into an Ising optimization problem. The Ising problem is solved using the CIM-CAC algorithm (simplified working equations in the bottom left). We solve the same problem instance multiple times called "anneals". In the bottom right, we see that each anneal leads to a different convergence state (due to randomness in the initialization of CIM-CAC). These solutions can correspond to sub-optimal solutions to the Ising problem or the optimal solution (top right). We select the best solution found by CIM-CAC and convert it back to the corresponding MIMO solution.
  • Figure 3: Working principle of Degenerate Optical Parametric Oscillator based Coherent Ising Machine (DOPO-CIM): this example illustrates two couplings between pulses representing $s_3$ and those representing $s_1$ and $s_2$ (implementing the term $J_{13}s_{1}s_{3} + J_{23}s_{2}s_{3}$).
  • Figure 4: MMGaP processing flow: the entire bandwidth is split into equal parts (equal to the number of available GPUs) and each part is assigned to a different GPU. Each MIMO problem is assigned to a different CUDA block within a GPU. Within a CUDA block, $N_a$ CUDA threads perform $N_a$ anneals (in parallel) corresponding to the assigned MIMO instance.
  • Figure 5: MMGaP's matrix-vector multiplication optimization: Exploiting the internal structure of Ising coefficients generated from the MIMO detection problem allows a decomposition into a smaller matrix-vector multiplication and a few vector operations. (a) Ising coefficients stacked as a matrix, (b) CIM states stacked as two vectors and a scalar, (c) required MVM (of size half), (d) required element-wise products, (e) required scaling operations, and (f) required dot product
  • ...and 11 more figures