How the use of feature selection methods influences the efficiency and accuracy of complex network simulations
Katarzyna Musial, Jiaqi Wen, Andreas Gwyther-Gouriotis
TL;DR
The paper addresses the challenge of incorporating real-world node attributes into Social Network Simulations to improve accuracy and efficiency. It introduces FS-SNS, a hybrid feature selection pipeline that first ranks features with unsupervised filter methods and then tests feature subsets via a wrapper, optimizing degree-distribution similarity. The approach yields improvements in 8 of 10 real-world networks and identifies a practical threshold of four features for best results, highlighting the importance of feature relevance and network topology. These findings support using feature selection to enhance Digital Twin and complex network system simulations by enabling more accurate representations with fewer informative attributes.
Abstract
Complex network systems' models are designed to perfectly emulate real-world networks through the use of simulation and link prediction. Complex network systems are defined by nodes and their connections where both have real-world features that result in a heterogeneous network in which each of the nodes has distinct characteristics. Thus, incorporating real-world features is an important component to achieve a simulation which best represents the real-world. Currently very few complex network systems implement real-world features, thus this study proposes feature selection methods which utilise unsupervised filtering techniques to rank real-world node features alongside a wrapper function to test combinations of the ranked features. The chosen method was coined FS-SNS which improved 8 out of 10 simulations of real-world networks. A consistent threshold of included features was also discovered which saw a threshold of 4 features to achieve the most accurate simulation for all networks. Through these findings the study also proposes future work and discusses how the findings can be used to further the Digital Twin and complex network system field.
