Node ranking in labeled networks
Chamalee Wickrama Arachchi, Nikolaj Tatti
TL;DR
The paper addresses the problem of ranking nodes in labeled, weighted directed graphs by constructing an explainable hierarchy via a label tree that uses node labels to partition nodes. The quality of a hierarchy is measured by agony, with a formal definition $q=\sum_{e\in E} w(e)p$ and $p=\max(0,d+1)$ where $d=r(u)-r(v)$, and leaves of the tree correspond to ranks; the goal, L-agony, seeks a tree with at most $k$ leaves minimizing $q$. The authors prove NP-hardness and inapproximability for this problem and propose a greedy divide-and-conquer algorithm that builds the label tree by iteratively selecting the best splits and updating counters to efficiently compute gains; a dynamic-programming pruning step enforces a user-specified leaf budget. Empirical evaluation on synthetic and real-world labeled networks shows the method can recover ground-truth or near-ground-truth hierarchies, produce interpretable hierarchies, and scale to large graphs with tens of thousands of nodes and edges. The work provides a practical approach to explainable hierarchical ranking in labeled networks and opens avenues for extending label-consistency constraints and robustness analyses.
Abstract
The entities in directed networks arising from real-world interactions are often naturally organized under some hierarchical structure. Given a directed, weighted, graph with edges and node labels, we introduce ranking problem where the obtained hierarchy should be described using node labels. Such method has the advantage to not only rank the nodes but also provide an explanation for such ranking. To this end, we define a binary tree called label tree, where each leaf represents a rank and each non-leaf contains a single label, which is then used to partition, and consequently, rank the nodes in the input graph. We measure the quality of trees using agony score, a penalty score that penalizes the edges from higher ranks to lower ranks based on the severity of the violation. We show that the problem is NP-hard, and even inapproximable if we limit the size of the label tree. Therefore, we resort to heuristics, and design a divide-and-conquer algorithm which runs in $\bigO{(n + m) \log n + \ell R}$, where $R$ is the number of node-label pairs in the given graph, $\ell$ is the number of nodes in the resulting label tree, and $n$ and $m$ denote the number of nodes and edges respectively. We also report an experimental study that shows that our algorithm can be applied to large networks, that it can find ground truth in synthetic datasets, and can produce explainable hierarchies in real-world datasets.
