Causality-Driven Reinforcement Learning for Joint Communication and Sensing
Anik Roy, Serene Banerjee, Jishnu Sadasivan, Arnab Sarkar, Soumyajit Dey
TL;DR
This work tackles the challenge of learning high-dimensional beam patterns in joint communication and sensing (JCAS) for mMIMO systems by introducing a causality-aware reinforcement learning framework. The authors deploy a state-wise action refinement mechanism (TD3-INVASE) to discover and prune relevant action dimensions, enabling efficient exploration of the beam codebook for both communication and sensing beams. They extend a three-stage communication design and a three-stage sensing design with a causal discovery component, achieving higher beamforming gains and improved sample efficiency over baselines in DeepMIMO-generated scenarios. The approach supports online adaptation and potential generalization via causal reasoning, with practical impact on low-overhead, high-performance JCAS in dynamic wireless environments.
Abstract
The next-generation wireless network, 6G and beyond, envisions to integrate communication and sensing to overcome interference, improve spectrum efficiency, and reduce hardware and power consumption. Massive Multiple-Input Multiple Output (mMIMO)-based Joint Communication and Sensing (JCAS) systems realize this integration for 6G applications such as autonomous driving, as it requires accurate environmental sensing and time-critical communication with neighboring vehicles. Reinforcement Learning (RL) is used for mMIMO antenna beamforming in the existing literature. However, the huge search space for actions associated with antenna beamforming causes the learning process for the RL agent to be inefficient due to high beam training overhead. The learning process does not consider the causal relationship between action space and the reward, and gives all actions equal importance. In this work, we explore a causally-aware RL agent which can intervene and discover causal relationships for mMIMO-based JCAS environments, during the training phase. We use a state dependent action dimension selection strategy to realize causal discovery for RL-based JCAS. Evaluation of the causally-aware RL framework in different JCAS scenarios shows the benefit of our proposed framework over baseline methods in terms of the beamforming gain.
