Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks
Lingyi Wang, Rashed Shelim, Walid Saad, Naren Ramakrishnan
TL;DR
This work tackles data inefficiency and short-sighted planning in reinforcement learning for dynamic wireless networks by introducing Dual-Mind World Models (DMWM), which fuse a pattern-driven System 1 based on a recurrent state-space model (RSSM) with a logic-driven System 2 using neural-symbolic reasoning (LINN). A logic-enhanced ELBO (LE-ELBO) ties imaginations to logical consistency, enabling reliable long-horizon planning of link scheduling in a mmWave V2X CAoI minimization problem. The authors validate their approach on a realistic Sionna-based simulator augmented with Blender and ArcGIS, reporting significant gains in data efficiency and CAoI reduction (up to around 32% over strong baselines) and strong generalization to unseen road scenes and network configurations. The results advocate for a practical, differentiable imagination-driven planning paradigm that integrates physics-based network dynamics with symbolic logic, offering a path toward scalable, robust wireless control under complex dynamics.
Abstract
Despite the popularity of reinforcement learning (RL) in wireless networks, existing approaches that rely on model-free RL (MFRL) and model-based RL (MBRL) are data inefficient and short-sighted. Such RL-based solutions cannot generalize to novel network states since they capture only statistical patterns rather than the underlying physics and logic from wireless data. These limitations become particularly challenging in complex wireless networks with high dynamics and long-term planning requirements. To address these limitations, in this paper, a novel dual-mind world model-based learning framework is proposed with the goal of optimizing completeness-weighted age of information (CAoI) in a challenging mmWave V2X scenario. Inspired by cognitive psychology, the proposed dual-mind world model encompasses a pattern-driven System 1 component and a logic-driven System 2 component to learn dynamics and logic of the wireless network, and to provide long-term link scheduling over reliable imagined trajectories. Link scheduling is learned through end-to-end differentiable imagined trajectories with logical consistency over an extended horizon rather than relying on wireless data obtained from environment interactions. Moreover, through imagination rollouts, the proposed world model can jointly reason network states and plan link scheduling. During intervals without observations, the proposed method remains capable of making efficient decisions. Extensive experiments are conducted on a realistic simulator based on Sionna with real-world physical channel, ray-tracing, and scene objects with material properties. Simulation results show that the proposed world model achieves a significant improvement in data efficiency and achieves strong generalization and adaptation to unseen environments, compared to the state-of-the-art RL baselines, and the world model approach with only System 1.
