Table of Contents
Fetching ...

DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets

Guangyuan Liu, Yinqiu Liu, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Sumei Sun, Abbas Jamalipour, Ping Zhang

TL;DR

This work tackles the challenge of jointly optimizing transmit beamforming and power-splitting ratios in SWIPT-enabled satellite–terrestrial HetNets under time-varying channels and multi-tier interference. It introduces DWM-RO, which combines decentralized world models with imagination-based planning, uncertainty-aware offloading to an edge server, and a lightweight latent decorrelation mechanism to coordinate actions. The approach yields substantial gains, including around a 5x improvement in sample efficiency, a 34.7% rise in spectral efficiency, and a 40% reduction in constraint violations in dense networks, demonstrating robust scalability and reliability. The proposed framework offers a practical, scalable solution for distributed resource optimization in next-generation wireless networks where cross-tier interference and dynamic EH are critical concerns.

Abstract

Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents' strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.

DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets

TL;DR

This work tackles the challenge of jointly optimizing transmit beamforming and power-splitting ratios in SWIPT-enabled satellite–terrestrial HetNets under time-varying channels and multi-tier interference. It introduces DWM-RO, which combines decentralized world models with imagination-based planning, uncertainty-aware offloading to an edge server, and a lightweight latent decorrelation mechanism to coordinate actions. The approach yields substantial gains, including around a 5x improvement in sample efficiency, a 34.7% rise in spectral efficiency, and a 40% reduction in constraint violations in dense networks, demonstrating robust scalability and reliability. The proposed framework offers a practical, scalable solution for distributed resource optimization in next-generation wireless networks where cross-tier interference and dynamic EH are critical concerns.

Abstract

Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents' strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.

Paper Structure

This paper contains 33 sections, 35 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of the considered SWIPT-enabled satellite-terrestrial HetNet model. (a) The overall network topology. (b) The communication model for each SUE. It receives a desired signal from the satellite, along with inter-SUE interference from other satellite beams and cross-tier interference from the FBS sharma2017performance10032267. (c) The communication and receiver model for one FUE. The FUE receives a desired signal and co-channel interference from the FBS, in addition to cross-tier interference from the satellite. The inset details its PS circuit, where the signal is split for ID (with processing noise $\sigma_{b}^{2}$) and EH.
  • Figure 2: DWM-RO architecture. Module 1: Each agent is equipped with a world model, which consists of two learning phases. First, an RSSM is trained on real environmental interactions to learn a predictive model of the network dynamics. Second, an AC policy is trained with both real and imagined samples. Module 2: A dedicated gate operates on each agent. It uses local interference and model reconstruction error to make a binary decision $\mathbb{D}_k(t)$. Module 3: When offloading is triggered, the agent's latent state $\mathbf{z}_k(t)$ is transmitted to the FBS. It aggregates latents from multiple agents, computes their common component, and returns a decorrelated latent state $\tilde{\mathbf{z}}_k(t)$ to the agent for coordinated action generation.
  • Figure 3: Visual and quantitative analysis of the uncertainty-aware reasoning offloading mechanism.
  • Figure 4: Performance evaluation of the base world model (Pure DWM) against baselines without offloading. The proposed Pure DWM demonstrates superior convergence speed, final reward, spectral efficiency, and constraint satisfaction.
  • Figure 5: Analysis of action correlation and reward improvement over 200 time steps. The "always offload" strategy dramatically reduces action correlation compared to the baseline. The green dots indicate that this decorrelation results in a reward improvement in nearly every step.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Remark 1
  • Example 1