Table of Contents
Fetching ...

Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access

Aswin Arun, Christo Kurisummoottil Thomas, Rimalpudi Sarvendranath, Walid Saad

TL;DR

A novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing and is shown to provide interpretable scheduling decisions via attention-based causal attribution, establishing causal MBRL as a practical approach for resource-constrained wireless systems.

Abstract

Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hindered by their sample inefficiency. To alleviate this challenge, one can leverage model-based reinforcement learning (MBRL) solutions, however, conventional MBRL approaches rely on black-box models that are not interpretable and cannot reason. In contrast, in this paper, a novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing. In particular, the proposed model can explicitly represent causal dependencies between network variables using structural causal models (SCMs) and attention-based inference networks. Interpretable causal models are then developed to capture how MAC control messages influence observations, how transmission actions determine outcomes, and how channel observations affect rewards. Data augmentation techniques are then used to generate synthetic rollouts using the learned causal model for policy optimization via proximal policy optimization (PPO). Analytical results demonstrate exponential sample complexity gains of causal MBRL over black-box approaches. Extensive simulations demonstrate that, on average, the proposed approach can reduce environment interactions by 58%, and yield faster convergence compared to model-free baselines. The proposed approach inherently is also shown to provide interpretable scheduling decisions via attention-based causal attribution, revealing which network conditions drive the policy. The resulting combination of sample efficiency and interpretability establishes causal MBRL as a practical approach for resource-constrained wireless systems.

Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access

TL;DR

A novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing and is shown to provide interpretable scheduling decisions via attention-based causal attribution, establishing causal MBRL as a practical approach for resource-constrained wireless systems.

Abstract

Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hindered by their sample inefficiency. To alleviate this challenge, one can leverage model-based reinforcement learning (MBRL) solutions, however, conventional MBRL approaches rely on black-box models that are not interpretable and cannot reason. In contrast, in this paper, a novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing. In particular, the proposed model can explicitly represent causal dependencies between network variables using structural causal models (SCMs) and attention-based inference networks. Interpretable causal models are then developed to capture how MAC control messages influence observations, how transmission actions determine outcomes, and how channel observations affect rewards. Data augmentation techniques are then used to generate synthetic rollouts using the learned causal model for policy optimization via proximal policy optimization (PPO). Analytical results demonstrate exponential sample complexity gains of causal MBRL over black-box approaches. Extensive simulations demonstrate that, on average, the proposed approach can reduce environment interactions by 58%, and yield faster convergence compared to model-free baselines. The proposed approach inherently is also shown to provide interpretable scheduling decisions via attention-based causal attribution, revealing which network conditions drive the policy. The resulting combination of sample efficiency and interpretability establishes causal MBRL as a practical approach for resource-constrained wireless systems.

Paper Structure

This paper contains 16 sections, 1 theorem, 13 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

For a wireless MAC system with $\lvert\mathcal{V}\rvert$ variables (includes state, action and reward), maximum support $V_{\max}$, causal in-degree $d_{\text{in}}$ (maximum number of parents for a single node in $\mathcal{G}$), and a desired accuracy of the causal model $\epsilon > 0$, causal MBRL versus black-box MBRL's ${\mathcal{O}}(V_{\max}^{|V|}/\epsilon^2)$, yielding exponential improvemen

Figures (3)

  • Figure 1: Causal Graph $\mathcal{G}$ for MAC Scheduling. Abstract level causal graph is provided for simplicity of understanding. We use the fine grained causal graph for our proposed method.
  • Figure 2: Causal MBRL wireless scheduling: Average episode rewards vs $T_{\text{max}}$ showing performance across different $U$ (BLER= 0.5).
  • Figure 3: Average rewards vs training episodes for different $T_{\max}$.

Theorems & Definitions (1)

  • Theorem 1