Complex Networks for Pattern-Based Data Classification
Josimar Chire, Khalid Mahmood, Zhao Liang
TL;DR
This work addresses high-level data classification by leveraging complex networks to capture internal relationships and class formation. It introduces two network-measure classifiers based on MST and SSSP applied to per-class graphs, assigning a test sample to the class whose insertion causes the smallest change in the measure, with a single measure and no weight parameters. Across synthetic and real datasets (Iris, Wine, Penguin, Pulsar, Covid-19), SSSP consistently offers faster execution and competitive accuracy relative to MST and traditional ML methods, supported by runtime and complexity analyses $O(E \log E)$ for MST and $O(E \log V)$ for SSSP. The results demonstrate the viability of pattern-formation-based network measures for classification and point to future work on dynamic network measures such as maximal flow to further enhance performance.
Abstract
Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing internal relationships and class structures, enabling High-Level Classification. Although several complex network-based classification techniques have been proposed, high-level classification by leveraging pattern formation to classify data has not been utilized. In this work, we present two network-based classification techniques utilizing unique measures derived from the Minimum Spanning Tree and Single Source Shortest Path. These network measures are evaluated from the data patterns represented by the inherent network constructed from each class. We have applied our proposed techniques to several data classification scenarios including synthetic and real-world datasets. Compared to the existing classic high-level and machine-learning classification techniques, we have observed promising numerical results for our proposed approaches. Furthermore, the proposed models demonstrate the following distinguished features in comparison to the previous high-level classification techniques: (1) A single network measure is introduced to characterize the data pattern, eliminating the need to determine weight parameters among network measures. Therefore, the model is largely simplified, while obtaining better classification results. (2) The metrics proposed are sensitive and used for classification with competitive results.
