DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets
Guangyuan Liu, Yinqiu Liu, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Sumei Sun, Abbas Jamalipour, Ping Zhang
TL;DR
This work tackles the challenge of jointly optimizing transmit beamforming and power-splitting ratios in SWIPT-enabled satellite–terrestrial HetNets under time-varying channels and multi-tier interference. It introduces DWM-RO, which combines decentralized world models with imagination-based planning, uncertainty-aware offloading to an edge server, and a lightweight latent decorrelation mechanism to coordinate actions. The approach yields substantial gains, including around a 5x improvement in sample efficiency, a 34.7% rise in spectral efficiency, and a 40% reduction in constraint violations in dense networks, demonstrating robust scalability and reliability. The proposed framework offers a practical, scalable solution for distributed resource optimization in next-generation wireless networks where cross-tier interference and dynamic EH are critical concerns.
Abstract
Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents' strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.
