Table of Contents
Fetching ...

Beam Selection in ISAC using Contextual Bandit with Multi-modal Transformer and Transfer Learning

Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci

TL;DR

This work tackles beam selection in ISAC-enabled indoor 6G by framing it as a multi-agent contextual bandit problem where each UE selects a beam based on ISAC sensing data and UE location. A novel pipeline combines a multi-modal transformer encoder with convolutional feature extraction and linear layers to produce action-valued Q-values, enabling cooperative beam decisions that account for inter-user interference. Transfer reinforcement learning is employed to adapt policies learned in single-user scenarios to multi-user settings, significantly reducing training time while maintaining or improving spectral efficiency; experiments on DeepSense 6G show substantial SE regret reductions (e.g., $49.6\%$ in single-UE and $19.7\%$ in multi-UE) and near-optimal performance compared to exhaustive search. The proposed approach offers robust generalization across diverse indoor environments and demonstrates the practical potential of ISAC data to enhance beam management in dynamic 6G networks.

Abstract

Sixth generation (6G) wireless technology is anticipated to introduce Integrated Sensing and Communication (ISAC) as a transformative paradigm. ISAC unifies wireless communication and RADAR or other forms of sensing to optimize spectral and hardware resources. This paper presents a pioneering framework that leverages ISAC sensing data to enhance beam selection processes in complex indoor environments. By integrating multi-modal transformer models with a multi-agent contextual bandit algorithm, our approach utilizes ISAC sensing data to improve communication performance and achieves high spectral efficiency (SE). Specifically, the multi-modal transformer can capture inter-modal relationships, enhancing model generalization across diverse scenarios. Experimental evaluations on the DeepSense 6G dataset demonstrate that our model outperforms traditional deep reinforcement learning (DRL) methods, achieving superior beam prediction accuracy and adaptability. In the single-user scenario, we achieve an average SE regret improvement of 49.6% as compared to DRL. Furthermore, we employ transfer reinforcement learning to reduce training time and improve model performance in multi-user environments. In the multi-user scenario, this approach enhances the average SE regret, which is a measure to demonstrate how far the learned policy is from the optimal SE policy, by 19.7% compared to training from scratch, even when the latter is trained 100 times longer.

Beam Selection in ISAC using Contextual Bandit with Multi-modal Transformer and Transfer Learning

TL;DR

This work tackles beam selection in ISAC-enabled indoor 6G by framing it as a multi-agent contextual bandit problem where each UE selects a beam based on ISAC sensing data and UE location. A novel pipeline combines a multi-modal transformer encoder with convolutional feature extraction and linear layers to produce action-valued Q-values, enabling cooperative beam decisions that account for inter-user interference. Transfer reinforcement learning is employed to adapt policies learned in single-user scenarios to multi-user settings, significantly reducing training time while maintaining or improving spectral efficiency; experiments on DeepSense 6G show substantial SE regret reductions (e.g., in single-UE and in multi-UE) and near-optimal performance compared to exhaustive search. The proposed approach offers robust generalization across diverse indoor environments and demonstrates the practical potential of ISAC data to enhance beam management in dynamic 6G networks.

Abstract

Sixth generation (6G) wireless technology is anticipated to introduce Integrated Sensing and Communication (ISAC) as a transformative paradigm. ISAC unifies wireless communication and RADAR or other forms of sensing to optimize spectral and hardware resources. This paper presents a pioneering framework that leverages ISAC sensing data to enhance beam selection processes in complex indoor environments. By integrating multi-modal transformer models with a multi-agent contextual bandit algorithm, our approach utilizes ISAC sensing data to improve communication performance and achieves high spectral efficiency (SE). Specifically, the multi-modal transformer can capture inter-modal relationships, enhancing model generalization across diverse scenarios. Experimental evaluations on the DeepSense 6G dataset demonstrate that our model outperforms traditional deep reinforcement learning (DRL) methods, achieving superior beam prediction accuracy and adaptability. In the single-user scenario, we achieve an average SE regret improvement of 49.6% as compared to DRL. Furthermore, we employ transfer reinforcement learning to reduce training time and improve model performance in multi-user environments. In the multi-user scenario, this approach enhances the average SE regret, which is a measure to demonstrate how far the learned policy is from the optimal SE policy, by 19.7% compared to training from scratch, even when the latter is trained 100 times longer.

Paper Structure

This paper contains 22 sections, 5 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: A contextual bandit agent. The context to the agent are the ISAC image and the UE location data, which are forwarded through three stages: Convolutional layers, MMT encoder, and linear layers. The output is the q-value for each action.
  • Figure 2: Average SE regret as a function of epochs.
  • Figure 3: Reward and Spectral Efficiency for each time step during the testing stage.
  • Figure 4: Reward comparison for each test time step between TRL model (100 epoch) and TL-TRL model (1 epoch).