Table of Contents
Fetching ...

Examining the Rat in the Tunnel: Interpretable Multi-Label Classification of Tor-based Malware

Ishan Karunanayake, Mashael AlSabah, Nadeem Ahmed, Sanjay Jha

TL;DR

A multi-label classification technique based on Message-Passing Neural Networks is used, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%.

Abstract

Despite being the most popular privacy-enhancing network, Tor is increasingly adopted by cybercriminals to obfuscate malicious traffic, hindering the identification of malware-related communications between compromised devices and Command and Control (C&C) servers. This malicious traffic can induce congestion and reduce Tor's performance, while encouraging network administrators to block Tor traffic. Recent research, however, demonstrates the potential for accurately classifying captured Tor traffic as malicious or benign. While existing efforts have addressed malware class identification, their performance remains limited, with micro-average precision and recall values around 70%. Accurately classifying specific malware classes is crucial for effective attack prevention and mitigation. Furthermore, understanding the unique patterns and attack vectors employed by different malware classes helps the development of robust and adaptable defence mechanisms. We utilise a multi-label classification technique based on Message-Passing Neural Networks, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset, by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%. Compared to previous work, we significantly improve performance by 19.98%, 10.15%, and 59.21% in MAP, MAR, and Hamming Loss, respectively. Next, we employ Explainable Artificial Intelligence (XAI) techniques to interpret the decision-making process within these models. Finally, we assess the robustness of all techniques by crafting adversarial perturbations capable of manipulating classifier predictions and generating false positives and negatives.

Examining the Rat in the Tunnel: Interpretable Multi-Label Classification of Tor-based Malware

TL;DR

A multi-label classification technique based on Message-Passing Neural Networks is used, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%.

Abstract

Despite being the most popular privacy-enhancing network, Tor is increasingly adopted by cybercriminals to obfuscate malicious traffic, hindering the identification of malware-related communications between compromised devices and Command and Control (C&C) servers. This malicious traffic can induce congestion and reduce Tor's performance, while encouraging network administrators to block Tor traffic. Recent research, however, demonstrates the potential for accurately classifying captured Tor traffic as malicious or benign. While existing efforts have addressed malware class identification, their performance remains limited, with micro-average precision and recall values around 70%. Accurately classifying specific malware classes is crucial for effective attack prevention and mitigation. Furthermore, understanding the unique patterns and attack vectors employed by different malware classes helps the development of robust and adaptable defence mechanisms. We utilise a multi-label classification technique based on Message-Passing Neural Networks, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset, by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%. Compared to previous work, we significantly improve performance by 19.98%, 10.15%, and 59.21% in MAP, MAR, and Hamming Loss, respectively. Next, we employ Explainable Artificial Intelligence (XAI) techniques to interpret the decision-making process within these models. Finally, we assess the robustness of all techniques by crafting adversarial perturbations capable of manipulating classifier predictions and generating false positives and negatives.
Paper Structure (40 sections, 8 equations, 11 figures, 9 tables)

This paper contains 40 sections, 8 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: How Malware can use Tor. (A) The botmaster can access the C&C server (hidden service or not) via Tor. (B) C&C and its victims (bots) communicate via Tor. (C) Infected machines (bots) can propagate the malware or attack other networks via Tor.
  • Figure 2: Label Co-occurrence Diagram for the D5 dataset. The diagram on the left provides a more wholesome view of the label co-occurrences, while the diagram on the right shows additional co-occurrences that cannot be depicted in the left diagram. The unknown circle is isolated as it represents unidentified malware classes.
  • Figure 3: Concept behind LaMP. A given set of input features is embedded into input nodes $\{Z1,Z2,Z3\}$ and the labels are embedded into label nodes $\{V1,V2,V3,V4\}$. The embedded label nodes are used to create a label interaction graph. First, messages are passed from input nodes to label nodes and update the label nodes. Then, messages are passed between labels before updating them again. After $T$ iterations, a readout function $R$ performs a node-level classification and makes binary predictions $\{Y1,Y2,Y3,Y4\}$ for each label.
  • Figure 4: Summary plot for each class with the ten most important features - BR
  • Figure 5: Summary plot for feature importance - BR
  • ...and 6 more figures