Table of Contents
Fetching ...

Malicious Internet Entity Detection Using Local Graph Inference

Simon Mandlik, Tomas Pevny, Vaclav Smidl, Lukas Bajer

TL;DR

This work tackles malicious Internet entity detection in massive, heterogeneous graphs by reframing the problem as local, neighborhood-based inference. It introduces HMILnet, a Hierarchical Multiple Instance Learning architecture, to process streamlined neighborhood subgraphs around a central vertex, enabling high expressivity and scalable inference without requiring global graph training. Empirically, HMILnet outperforms the Probabilistic Threat Propagation baseline in multi-relational settings and generalizes to unseen domains, with performance improving further when leveraging all available relation types. The approach is practical for large-scale cyber defense, as it relies on raw interaction data and a dynamic denylist, and it offers avenues for incorporating external features and improving interpretability in future work.

Abstract

Detection of malicious behavior in a large network is a challenging problem for machine learning in computer security, since it requires a model with high expressive power and scalable inference. Existing solutions struggle to achieve this feat -- current cybersec-tailored approaches are still limited in expressivity, and methods successful in other domains do not scale well for large volumes of data, rendering frequent retraining impossible. This work proposes a new perspective for learning from graph data that is modeling network entity interactions as a large heterogeneous graph. High expressivity of the method is achieved with neural network architecture HMILnet that naturally models this type of data and provides theoretical guarantees. The scalability is achieved by pursuing local graph inference, i.e., classifying individual vertices and their neighborhood as independent samples. Our experiments exhibit improvement over the state-of-the-art Probabilistic Threat Propagation (PTP) algorithm, show a further threefold accuracy improvement when additional data is used, which is not possible with the PTP algorithm, and demonstrate the generalization capabilities of the method to new, previously unseen entities.

Malicious Internet Entity Detection Using Local Graph Inference

TL;DR

This work tackles malicious Internet entity detection in massive, heterogeneous graphs by reframing the problem as local, neighborhood-based inference. It introduces HMILnet, a Hierarchical Multiple Instance Learning architecture, to process streamlined neighborhood subgraphs around a central vertex, enabling high expressivity and scalable inference without requiring global graph training. Empirically, HMILnet outperforms the Probabilistic Threat Propagation baseline in multi-relational settings and generalizes to unseen domains, with performance improving further when leveraging all available relation types. The approach is practical for large-scale cyber defense, as it relies on raw interaction data and a dynamic denylist, and it offers avenues for incorporating external features and improving interpretability in future work.

Abstract

Detection of malicious behavior in a large network is a challenging problem for machine learning in computer security, since it requires a model with high expressive power and scalable inference. Existing solutions struggle to achieve this feat -- current cybersec-tailored approaches are still limited in expressivity, and methods successful in other domains do not scale well for large volumes of data, rendering frequent retraining impossible. This work proposes a new perspective for learning from graph data that is modeling network entity interactions as a large heterogeneous graph. High expressivity of the method is achieved with neural network architecture HMILnet that naturally models this type of data and provides theoretical guarantees. The scalability is achieved by pursuing local graph inference, i.e., classifying individual vertices and their neighborhood as independent samples. Our experiments exhibit improvement over the state-of-the-art Probabilistic Threat Propagation (PTP) algorithm, show a further threefold accuracy improvement when additional data is used, which is not possible with the PTP algorithm, and demonstrate the generalization capabilities of the method to new, previously unseen entities.
Paper Structure (34 sections, 12 equations, 14 figures, 7 tables, 2 algorithms)

This paper contains 34 sections, 12 equations, 14 figures, 7 tables, 2 algorithms.

Figures (14)

  • Figure 1: An example of a network graph of binaries (represented by SHA hashes) in purple, second-level domains in red, URL paths in green, IP addresses in gray, and emails in yellow. Edges represent an interaction, for example, when a binary has contacted a domain hosted on an IP address or when a domain is registered with an email address. Sequences of many repeated characters in URL names were shortened using ellipsis. The example is taken from https://www.threatcrowd.org/, and the graph was symmetrized for demonstration purposes. Best viewed in color.
  • Figure 2: A depiction of a model to solve MIL problems used in Pevny2017aEdwards2016.
  • Figure 3: Graph transformation of a bipartite graph $\mathcal{G}^{(i)}$ (on the left) to a transformed graph$G^{(i)}$ (on the right). Edges in the transformed graph are labeled with the names of vertices that interacted with both incident vertices in the original graph. Note that vertices $v_1$, $v_7$, and $v_8$ all have only one neighbor and thus do not influence the resulting transformed graph.
  • Figure 4: The whole model procedure for the case when $T = 1$. Input bipartite graphs $\mathcal{G}^{(i)}$ representing input binary relations on the left are first used to obtain the same number of transformed graphs $G^{(i)}$. The central vertex representing one particular domain is highlighted in each graph as well as its neighbors in the transformed graph. Note that the left part of each bipartite graph and each transformed graph consist of an identical set of vertices (in our case representing domains) as opposed to edges, which differ with each relation. In the next phase, we aggregate vertex and edge features as explained in Section \ref{['sec:hmilnet_based_graph_inference']}. Each of the three graphs uses (sub)models $\widehat{f}^{(i)}, g^{(i)}, a^{(i)}, \widetilde{f}^{(i)}, r^{(i)}$ that do not share parameters and may even have different topology. Finally, the product construction is done to obtain the final output, which in the 1-step case is interpreted as a vector of predictive probabilities. Best viewed in color.
  • Figure 5: A PR curve and an ROC curve with logarithmic $x$ axis comparing the performance of the three proposed architectures to the PTP algorithm. In these experiments, only one relation is employed.
  • ...and 9 more figures