Table of Contents
Fetching ...

LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning

Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb

TL;DR

Experimental results on channel prediction across multiple mobility regimes show consistent improvements over strong baselines, particularly under long horizons and limited fine-tuning data, highlighting the importance of geometry-aware architectures and geometry-consistent pretraining for learning transferable spatiotemporal wireless representations.

Abstract

LWM-Temporal is a new member of the Large Wireless Models (LWM) family that targets the spatiotemporal nature of wireless channels. Designed as a task-agnostic foundation model, LWM-Temporal learns universal channel embeddings that capture mobility-induced evolution and are reusable across various downstream tasks. To achieve this objective, LWM-Temporal operates in the angle-delay-time domain and introduces Sparse Spatio-Temporal Attention (SSTA), a propagation-aligned attention mechanism that restricts interactions to physically plausible neighborhoods, reducing attention complexity by an order of magnitude while preserving geometry-consistent dependencies. LWM-Temporal is pretrained in a self-supervised manner using a physics-informed masking curriculum that emulates realistic occlusions, pilot sparsity, and measurement impairments. Experimental results on channel prediction across multiple mobility regimes show consistent improvements over strong baselines, particularly under long horizons and limited fine-tuning data, highlighting the importance of geometry-aware architectures and geometry-consistent pretraining for learning transferable spatiotemporal wireless representations.

LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning

TL;DR

Experimental results on channel prediction across multiple mobility regimes show consistent improvements over strong baselines, particularly under long horizons and limited fine-tuning data, highlighting the importance of geometry-aware architectures and geometry-consistent pretraining for learning transferable spatiotemporal wireless representations.

Abstract

LWM-Temporal is a new member of the Large Wireless Models (LWM) family that targets the spatiotemporal nature of wireless channels. Designed as a task-agnostic foundation model, LWM-Temporal learns universal channel embeddings that capture mobility-induced evolution and are reusable across various downstream tasks. To achieve this objective, LWM-Temporal operates in the angle-delay-time domain and introduces Sparse Spatio-Temporal Attention (SSTA), a propagation-aligned attention mechanism that restricts interactions to physically plausible neighborhoods, reducing attention complexity by an order of magnitude while preserving geometry-consistent dependencies. LWM-Temporal is pretrained in a self-supervised manner using a physics-informed masking curriculum that emulates realistic occlusions, pilot sparsity, and measurement impairments. Experimental results on channel prediction across multiple mobility regimes show consistent improvements over strong baselines, particularly under long horizons and limited fine-tuning data, highlighting the importance of geometry-aware architectures and geometry-consistent pretraining for learning transferable spatiotemporal wireless representations.
Paper Structure (14 sections, 12 equations, 4 figures, 2 tables)

This paper contains 14 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: This figure illustrates the geometry-driven evolution of wireless channels in dynamic environments and its role in efficient modeling. As users move, propagation paths evolve according to the underlying geometry, inducing structured changes in angle, delay, and Doppler. Transforming channels into the angle-delay-time domain exposes these physical and temporal structures in a more interpretable and sparser representation, enabling efficient tokenization and sparse attention over physically plausible interactions that are less explicit in the original space-frequency-time domain.
  • Figure 2: LWM-Temporal overview. Channels are transformed to angle-delay, tokenized, and pretrained with physics-informed masking and RoPE-based sparse spatiotemporal attention (SSTA) to produce embeddings. SSTA is bidirectional for reconstruction and causal for forecasting to prevent leakage.
  • Figure 3: Attention patterns for angle-delay sequences. Colored strips show the flattened token order (time, angle, delay); each matrix shows allowable attention links. (a) Full attention: all-to-all, $\mathcal{O}(N^2)$. (b) SSTA: predefined local neighbors, $\mathcal{O}(KN)$. (c) SSTA (routed): $K_r$ adaptive neighbors, $\mathcal{O}(K_rN)$.
  • Figure 4: Visualization of four physics-informed masking strategies. The first three masks (rectangular, pilot-lattice, random) operate within individual frames, while the spatiotemporal tube mask creates a continuous occluded region that evolves across multiple time steps.