Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning

Cheng Wang; Christopher Redino; Abdul Rahman; Ryan Clark; Daniel Radke; Tyler Cody; Dhruv Nandakumar; Edward Bowen

Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning

Cheng Wang, Christopher Redino, Abdul Rahman, Ryan Clark, Daniel Radke, Tyler Cody, Dhruv Nandakumar, Edward Bowen

TL;DR

This work addresses the challenge of automatically discovering resilient C2 channels across public and Tor networks under firewall constraints. It casts the problem as a multi-stage reinforcement learning task in a Markov decision process with state space $\mathcal{S}$, action space $\mathcal{A}$, transition $\mathcal{P}$, reward $r$, and discount factor $\gamma$, optimized via Proximal Policy Optimization (PPO) with a clipped objective to control policy updates. The main contributions are: (i) a three-stage C2 simulation incorporating Tor and public channels, (ii) a CVSS-MDP style reward framework that accounts for defender actions, and (iii) empirical evidence that the RL agent can identify viable attack paths and evade firewall defenses in a standard network setting, achieving roughly 60% success over 100 trials. The results underscore the potential of RL for threat modeling and can guide defense planning, firewall tuning, and detection strategies.

Abstract

Command and control (C2) channels are an essential component of many types of cyber attacks, as they enable attackers to remotely control their malware-infected machines and execute harmful actions, such as propagating malicious code across networks, exfiltrating confidential data, or initiating distributed denial of service (DDoS) attacks. Identifying these C2 channels is therefore crucial in helping to mitigate and prevent cyber attacks. However, identifying C2 channels typically involves a manual process, requiring deep knowledge and expertise in cyber operations. In this paper, we propose a reinforcement learning (RL) based approach to automatically emulate C2 attack campaigns using both the normal (public) and the Tor networks. In addition, payload size and network firewalls are configured to simulate real-world attack scenarios. Results on a typical network configuration show that the RL agent can automatically discover resilient C2 attack paths utilizing both Tor-based and conventional communication channels, while also bypassing network firewalls.

Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning

TL;DR

, action space

, transition

, reward

, and discount factor

, optimized via Proximal Policy Optimization (PPO) with a clipped objective to control policy updates. The main contributions are: (i) a three-stage C2 simulation incorporating Tor and public channels, (ii) a CVSS-MDP style reward framework that accounts for defender actions, and (iii) empirical evidence that the RL agent can identify viable attack paths and evade firewall defenses in a standard network setting, achieving roughly 60% success over 100 trials. The results underscore the potential of RL for threat modeling and can guide defense planning, firewall tuning, and detection strategies.

Abstract

Paper Structure (17 sections, 6 equations, 3 figures, 5 tables)

This paper contains 17 sections, 6 equations, 3 figures, 5 tables.

Introduction
Related Work
Background
Methods
Attack Simulation Overview
Infection Stage
Connection Stage
Exfiltration Stage
Reinforcement Learning Formulation
States
Actions
Rewards
Experiments
Network Description
Training Details
...and 2 more sections

Figures (3)

Figure 1: Network diagram with nodes IDs and services listed. Services include file transfer protocol(ftp), hypertext transfer protocol(http), virtual private network(vpn), standard query language(sql), secure shell(ssh), Samba(samba), public key infrastructure(pki), simple mail transfer protocol(smtp) and MongoDB(mongodb). An initial foothold is gained on host (1,0) (the green node) from subnet 1. Two targets are identified and highlighted in orange.
Figure 2: Average of episode rewards (top) and steps (bottoms) during the training process for targets (5,1) and (7,3), respectively.
Figure 3: Timelines of connect and upload actions taken in the attack paths to host (5, 1) (top) and (7,2) (bottom).

Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning

TL;DR

Abstract

Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)