Identification of Device Dependencies Using Link Prediction

Lukáš Sadlek; Martin Husák; Pavel Čeleda

Identification of Device Dependencies Using Link Prediction

Lukáš Sadlek, Martin Husák, Pavel Čeleda

TL;DR

The paper addresses the challenge of identifying device dependencies in large, dynamic networks using passively collected IP flows. It introduces a latent-graph, link-prediction approach that relies on time-constrained, directed random walks to generate IP-address embeddings, from which dependency embeddings are formed and used to train a dependency classifier. Key contributions include a novel constrained-walk embedding pipeline inspired by Node2Vec, the ability to detect multiple dependency types (DD, LR, RR) and transitive forms (TD, TD3), and an evaluation showing acceptable performance on cyber-defense and campus datasets with AUC around $0.63$–$0.74$ and AP around $0.74$–$0.88$. The method supports batch processing, scales to large data, and remains applicable under privacy-preserving or encrypted flows, offering a practical tool for risk analysis and network management.

Abstract

Devices in computer networks cannot work without essential network services provided by a limited count of devices. Identification of device dependencies determines whether a pair of IP addresses is a dependency, i.e., the host with the first IP address is dependent on the second one. These dependencies cannot be identified manually in large and dynamically changing networks. Nevertheless, they are important due to possible unexpected failures, performance issues, and cascading effects. We address the identification of dependencies using a new approach based on graph-based machine learning. The approach belongs to link prediction based on a latent representation of the computer network's communication graph. It samples random walks over IP addresses that fulfill time conditions imposed on network dependencies. The constrained random walks are used by a neural network to construct IP address embedding, which is a space that contains IP addresses that often appear close together in the same communication chain (i.e., random walk). Dependency embedding is constructed by combining values for IP addresses from their embedding and used for training the resulting dependency classifier. We evaluated the approach using IP flow datasets from a controlled environment and university campus network that contain evidence about dependencies. Evaluation concerning the correctness and relationship to other approaches shows that the approach achieves acceptable performance. It can simultaneously consider all types of dependencies and is applicable for batch processing in operational conditions.

Identification of Device Dependencies Using Link Prediction

TL;DR

–

and AP around

–

. The method supports batch processing, scales to large data, and remains applicable under privacy-preserving or encrypted flows, offering a practical tool for risk analysis and network management.

Abstract

Paper Structure (15 sections, 5 equations, 4 figures, 4 tables)

This paper contains 15 sections, 5 equations, 4 figures, 4 tables.

Introduction
Related Work
Method for Identification of Dependencies
Implementation of the Method
Sampling and Data Preprocessing
Random Walks
Splitting of Chains
Embedding and Model Fitting
Evaluation
Datasets
Ground Truth
Properties of the Method
Comparison with Local Similarity Indices
Lessons Learned
Conclusion

Figures (4)

Figure 1: A sequence diagram containing local-remote dependency of the web server on the database server (the first activation of the user device) and remote-remote dependency of the web server on the DNS server (the second activation of the user device). Activations (vertical rectangles) denote participation of lifelines. Time passes from top to bottom in the diagram.
Figure 2: Steps of the proposed approach from processing of input data to obtaining dependency classifier.
Figure 3: Example of network communication between workstations and servers. Solid lines represent a communication chain. All edges consist of forward and reverse IP flows. Numbers are the last octets from IPv4 addresses.
Figure 4: The ROC and PR curves of the proposed approach for team five from the cyber defense exercise.

Identification of Device Dependencies Using Link Prediction

TL;DR

Abstract

Identification of Device Dependencies Using Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)