A Clustering Method with Graph Maximum Decoding Information
Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, Jin Yan, Shan Jiang, Zhiming Ding
TL;DR
This work introduces CMDI, a graph-based clustering framework that maximizes decoding information by combining two-dimensional structural information theory with graph structure extraction and vertex partitioning. It formalizes one- and two-dimensional structural entropies, and defines decoding information as their difference, using a greedy DI-Maximized approximating optimal partition (GDIMAOP) to partition graphs, optionally enhanced by prior knowledge (PK) in CMDI-PK. Empirical results on synthetic and real-world geospatial datasets show CMDI and CMDI-PK achieving higher decoding information ratios (DI-R) and faster convergence than baselines, with maximum-likelihood-based structure extraction and proximity-metric choices guiding graph reconstruction. The approach offers a principled, information-theoretic pathway to robust graph-based clustering, with practical impact in domains requiring reliable discovery of natural data associations and efficient clustering under uncertainty.
Abstract
The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.
