Malicious Internet Entity Detection Using Local Graph Inference
Simon Mandlik, Tomas Pevny, Vaclav Smidl, Lukas Bajer
TL;DR
This work tackles malicious Internet entity detection in massive, heterogeneous graphs by reframing the problem as local, neighborhood-based inference. It introduces HMILnet, a Hierarchical Multiple Instance Learning architecture, to process streamlined neighborhood subgraphs around a central vertex, enabling high expressivity and scalable inference without requiring global graph training. Empirically, HMILnet outperforms the Probabilistic Threat Propagation baseline in multi-relational settings and generalizes to unseen domains, with performance improving further when leveraging all available relation types. The approach is practical for large-scale cyber defense, as it relies on raw interaction data and a dynamic denylist, and it offers avenues for incorporating external features and improving interpretability in future work.
Abstract
Detection of malicious behavior in a large network is a challenging problem for machine learning in computer security, since it requires a model with high expressive power and scalable inference. Existing solutions struggle to achieve this feat -- current cybersec-tailored approaches are still limited in expressivity, and methods successful in other domains do not scale well for large volumes of data, rendering frequent retraining impossible. This work proposes a new perspective for learning from graph data that is modeling network entity interactions as a large heterogeneous graph. High expressivity of the method is achieved with neural network architecture HMILnet that naturally models this type of data and provides theoretical guarantees. The scalability is achieved by pursuing local graph inference, i.e., classifying individual vertices and their neighborhood as independent samples. Our experiments exhibit improvement over the state-of-the-art Probabilistic Threat Propagation (PTP) algorithm, show a further threefold accuracy improvement when additional data is used, which is not possible with the PTP algorithm, and demonstrate the generalization capabilities of the method to new, previously unseen entities.
