Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning
Cheng Wang, Christopher Redino, Abdul Rahman, Ryan Clark, Daniel Radke, Tyler Cody, Dhruv Nandakumar, Edward Bowen
TL;DR
This work addresses the challenge of automatically discovering resilient C2 channels across public and Tor networks under firewall constraints. It casts the problem as a multi-stage reinforcement learning task in a Markov decision process with state space $\mathcal{S}$, action space $\mathcal{A}$, transition $\mathcal{P}$, reward $r$, and discount factor $\gamma$, optimized via Proximal Policy Optimization (PPO) with a clipped objective to control policy updates. The main contributions are: (i) a three-stage C2 simulation incorporating Tor and public channels, (ii) a CVSS-MDP style reward framework that accounts for defender actions, and (iii) empirical evidence that the RL agent can identify viable attack paths and evade firewall defenses in a standard network setting, achieving roughly 60% success over 100 trials. The results underscore the potential of RL for threat modeling and can guide defense planning, firewall tuning, and detection strategies.
Abstract
Command and control (C2) channels are an essential component of many types of cyber attacks, as they enable attackers to remotely control their malware-infected machines and execute harmful actions, such as propagating malicious code across networks, exfiltrating confidential data, or initiating distributed denial of service (DDoS) attacks. Identifying these C2 channels is therefore crucial in helping to mitigate and prevent cyber attacks. However, identifying C2 channels typically involves a manual process, requiring deep knowledge and expertise in cyber operations. In this paper, we propose a reinforcement learning (RL) based approach to automatically emulate C2 attack campaigns using both the normal (public) and the Tor networks. In addition, payload size and network firewalls are configured to simulate real-world attack scenarios. Results on a typical network configuration show that the RL agent can automatically discover resilient C2 attack paths utilizing both Tor-based and conventional communication channels, while also bypassing network firewalls.
