Table of Contents
Fetching ...

A 1.2 mm$^2$ 416 mW 1.44 Mmat/s 64$\times$16 Matrix Preprocessing ASIC for Massive MIMO in 22FDX

Darja Nonaca, Christoph Studer

TL;DR

This work proposes a novel preprocessing architecture based on the block-LDL matrix factorization, which improves parallelism and, hence, reduces latency in massive MU-MIMO and demonstrates the effectiveness of the architecture through system simulations with mmWave channel vectors and measurements of a 22FDX ASIC.

Abstract

Massive multiuser (MU) multiple-input multiple-output (MIMO) enables concurrent transmission of multiple users to a multi-antenna basestation (BS). To detect the users' data using linear equalization, the BS must perform preprocessing, which requires, among other tasks, the inversion of a matrix whose dimension equals the number of user data streams. Explicit inversion of large matrices is notoriously difficult to implement due to high complexity, stringent data dependencies that lead to high latency, and high numerical precision requirements. We propose a novel preprocessing architecture based on the block-LDL matrix factorization, which improves parallelism and, hence, reduces latency. We demonstrate the effectiveness of our architecture through (i) massive MU-MIMO system simulations with mmWave channel vectors and (ii) measurements of a 22FDX ASIC, which is, to our knowledge, the first fabricated preprocessing engine for massive MU-MIMO with 64 BS antennas and 16 single-antenna users. Our ASIC reaches a clock frequency of 870 MHz while consuming 416 mW. At its peak throughput, the ASIC preprocesses 1.44 M 64$\times$16 matrices per second at a latency of only 0.7 $μ$s.

A 1.2 mm$^2$ 416 mW 1.44 Mmat/s 64$\times$16 Matrix Preprocessing ASIC for Massive MIMO in 22FDX

TL;DR

This work proposes a novel preprocessing architecture based on the block-LDL matrix factorization, which improves parallelism and, hence, reduces latency in massive MU-MIMO and demonstrates the effectiveness of the architecture through system simulations with mmWave channel vectors and measurements of a 22FDX ASIC.

Abstract

Massive multiuser (MU) multiple-input multiple-output (MIMO) enables concurrent transmission of multiple users to a multi-antenna basestation (BS). To detect the users' data using linear equalization, the BS must perform preprocessing, which requires, among other tasks, the inversion of a matrix whose dimension equals the number of user data streams. Explicit inversion of large matrices is notoriously difficult to implement due to high complexity, stringent data dependencies that lead to high latency, and high numerical precision requirements. We propose a novel preprocessing architecture based on the block-LDL matrix factorization, which improves parallelism and, hence, reduces latency. We demonstrate the effectiveness of our architecture through (i) massive MU-MIMO system simulations with mmWave channel vectors and (ii) measurements of a 22FDX ASIC, which is, to our knowledge, the first fabricated preprocessing engine for massive MU-MIMO with 64 BS antennas and 16 single-antenna users. Our ASIC reaches a clock frequency of 870 MHz while consuming 416 mW. At its peak throughput, the ASIC preprocesses 1.44 M 6416 matrices per second at a latency of only 0.7 s.

Paper Structure

This paper contains 14 sections, 8 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Top-level architecture of the implemented preprocessing engine. The bus width of the input and output data accommodates the size of a row of the channel matrix $\mathbf{H}\xspace$ (16 complex values of 21 bits per part). The bus at the interface with the register array fits a $2\times2$ matrix with $4$ complex values.
  • Figure 2: Schedule (in clock cycles) of each preprocessing step.
  • Figure 3: Architecture details of the systolic array that supports two modes: Gram-matrix computation (blue datapath) and backward substitution (red datapath). We illustrate an architecture for $U=4$ users.
  • Figure 4: BLDL-factorization engine. The architecture is composed of four arithmetic units: MMAC, MSUB, MINV and MMULT. The units operate on $2\times2$ matrices and are controlled by an FSM in a processor-like fashion.
  • Figure 5: Uncoded bit-error rate (BER) vs. signal-to-noise ratio (SNR) of LMMSE-based equalization for (a) non-LoS and (b) LoS mmWave channels.
  • ...and 1 more figures