Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

Justin Kong; Terrence J. Moore; Fikadu T. Dagefu

Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

Justin Kong, Terrence J. Moore, Fikadu T. Dagefu

TL;DR

This work addresses covert routing in heterogeneous networks where multiple communication modalities are available. It introduces a decentralized Q-learning framework (Q-covert routing) in which each node selects the next hop and a modality using only local feedback, aiming to maximize the end-to-end detection-exclusion probability $P_{ ext{DEP}}$ while enforcing a throughput constraint $U_{ ext{target}}$. The method defines state, action, and cost as $S_T$, $A_T$, and $c_T(a)=\ln(1/P_{ ext{DEP},h(a)})$, and updates $Q_T(s,a)$ via $Q_T(s,a) \leftarrow (1-\alpha)Q_T(s,a) + \alpha ( c_T(a) + \gamma \hat{c}_T(a) )$ with an $\epsilon$-greedy policy. Numerical results show that the decentralized approach closely matches the centralized optimum (e.g., $P_{ ext{DEP}}$ around 0.78 with a negligible gap) and outperforms naive routing methods, demonstrating scalability and robustness to Willie’s position.

Abstract

This letter investigates covert routing communications in a heterogeneous network where a source transmits confidential data to a destination with the aid of relaying nodes where each transmitter judiciously chooses one modality among multiple communication modalities. We develop a novel reinforcement learning-based covert routing algorithm that finds a route from the source to the destination where each node identifies its next hop and modality only based on the local feedback information received from its neighboring nodes. We show based on numerical simulations that the proposed covert routing strategy has only negligible performance loss compared to the optimal centralized routing scheme.

Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

TL;DR

while enforcing a throughput constraint

. The method defines state, action, and cost as

, and

, and updates

via

with an

-greedy policy. Numerical results show that the decentralized approach closely matches the centralized optimum (e.g.,

around 0.78 with a negligible gap) and outperforms naive routing methods, demonstrating scalability and robustness to Willie’s position.

Abstract

Paper Structure (11 sections, 16 equations, 4 figures)

This paper contains 11 sections, 16 equations, 4 figures.

Introduction
Network model and problem formulation
Network Model
Detection at Willie
Problem Formulation
Proposed Q-Covert Routing for HetNets
Definitions of State, Action, and Cost
Proposed Algorithm
Centralized Approach
Numerical Results
Conclusion

Figures (4)

Figure 1: A 3D simulation environment with 36 legitimate nodes.
Figure 2: Optimizes routes and selected modalities. The hops with modalities $M_1$, $M_2$, and $M_3$ are respectively highlighted in blue, magenta, and green. The routes with solid lines and dashed lines are from the proposed technique and the centralized method, respectively.
Figure 3: The end-to-end DEP performance as a function of the X coordinate of Willie's location.
Figure 4: The end-to-end DEP performance of the proposed Q-covert routing algorithm as a function of the number of episodes $n_{\text{episode}}$.

Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

TL;DR

Abstract

Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)