Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding

Xavier Tardy; Grégoire Lefebvre; Apostolos Kountouris; Haïfa Fares; Amor Nafkha

Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding

Xavier Tardy, Grégoire Lefebvre, Apostolos Kountouris, Haïfa Fares, Amor Nafkha

TL;DR

This work tackles multi-AP uplink decoding for OFDM by introducing a cross-attention Transformer that jointly processes observations from multiple coordinated APs. Each AP is first encoded by a shared self-attention encoder, and a token-wise anchor-based cross-attention module fuses AP views to produce per-receiver soft information in the form of log-likelihood ratios $L_i$, without requiring explicit per-AP CSI. Trained with a Bit-Metric Decoding objective, the model learns data-dependent fusion that adapts to per-AP reliability and remains robust under missing links and pilot sparsity, achieving BER gains over LS/LMMSE and CNN baselines, and approaching or surpassing a perfect CSI reference in higher cooperation regimes. The approach is compact ($0.15$M parameters, $0.24$ GFLOPs), offers low latency on GPUs, and demonstrates practical viability as a building block for next-generation Wi-Fi receivers across realistic 3GPP TR 38.901 UMi channels. This work thus provides a scalable, robust fusion mechanism for cooperative uplink reception with significant operational relevance for Wi-Fi 7/8 deployments.

Abstract

We propose a cross-attention Transformer for joint decoding of uplink OFDM signals received by multiple coordinated access points. A shared per-receiver encoder learns time-frequency structure within each received grid, and a token-wise cross-attention module fuses the receivers to produce soft log-likelihood ratios for a standard channel decoder, without requiring explicit per-receiver channel estimates. Trained with a bit-metric objective, the model adapts its fusion to per-receiver reliability, tolerates missing or degraded links, and remains robust when pilots are sparse. Across realistic Wi-Fi channels, it consistently outperforms classical pipelines and strong convolutional baselines, frequently matching (and in some cases surpassing) a powerful baseline that assumes perfect channel knowledge per access point. Despite its expressiveness, the architecture is compact, has low computational cost (low GFLOPs), and achieves low latency on GPUs, making it a practical building block for next-generation Wi-Fi receivers.

Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding

TL;DR

, without requiring explicit per-AP CSI. Trained with a Bit-Metric Decoding objective, the model learns data-dependent fusion that adapts to per-AP reliability and remains robust under missing links and pilot sparsity, achieving BER gains over LS/LMMSE and CNN baselines, and approaching or surpassing a perfect CSI reference in higher cooperation regimes. The approach is compact (

M parameters,

GFLOPs), offers low latency on GPUs, and demonstrates practical viability as a building block for next-generation Wi-Fi receivers across realistic 3GPP TR 38.901 UMi channels. This work thus provides a scalable, robust fusion mechanism for cooperative uplink reception with significant operational relevance for Wi-Fi 7/8 deployments.

Abstract

Paper Structure (26 sections, 14 equations, 3 figures, 3 tables)

This paper contains 26 sections, 14 equations, 3 figures, 3 tables.

Introduction
State of the art
Classical Estimators: LS and LMMSE
Point-to-Point Data-Driven Receivers
CNN-based receiver
LSTM-based receiver
Transformer-based receiver
From per‑AP processing to coordinated multi‑AP uplink
System Model and Problem Formulation
Assumptions
OFDM Transmission Model
Channel Model
Multi-AP Coordination and Decoding Objective
Proposed Transformer-based Joint Decoder
Network Architecture
...and 11 more sections

Figures (3)

Figure 1: Neural coordinated decoding with three APs.
Figure 2: Architecture of the proposed Transformer joint decoder.
Figure 3: BER performance vs. Eb/N0 for varying cooperation levels ($N_R = 1, 2, 3$) and pilot configurations (1 vs. 2 pilot columns)

Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding

TL;DR

Abstract

Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (3)