Cyber Key Terrain Identification Using Adjusted PageRank Centrality
Lukáš Sadlek, Pavel Čeleda
TL;DR
This work addresses identifying cyber key terrain by translating network position into centrality scores using PageRank, but augments the standard approach with per-edge damping factors $d_{uv}$ learned from port-pair interactions. The authors employ hill climbing and random walk to optimize these factors during a one-time learning phase on static graphs, then apply a streaming PageRank computation to IP-flow data, with unseen edges defaulting to $0.85$. Across cyber defense and campus-network datasets, the adjusted centrality method achieves higher $F1$ performance than the traditional PageRank and demonstrates near-real-time processing on large IP-flow streams. The approach provides a scalable, flow-aware mechanism to prioritize cyber assets for defense, while acknowledging temporal fluctuations, data-labeling challenges, and memory considerations as future work.
Abstract
The cyber terrain contains devices, network services, cyber personas, and other network entities involved in network operations. Designing a method that automatically identifies key network entities to network operations is challenging. However, such a method is essential for determining which cyber assets should the cyber defense focus on. In this paper, we propose an approach for the classification of IP addresses belonging to cyber key terrain according to their network position using the PageRank centrality computation adjusted by machine learning. We used hill climbing and random walk algorithms to distinguish PageRank's damping factors based on source and destination ports captured in IP flows. The one-time learning phase on a static data sample allows near-real-time stream-based classification of key hosts from IP flow data in operational conditions without maintaining a complete network graph. We evaluated the approach on a dataset from a cyber defense exercise and on data from the campus network. The results show that cyber key terrain identification using the adjusted computation of centrality is more precise than its original version.
