Finding coherent node groups in directed graphs

Iiro Kumpulainen; Nikolaj Tatti

Finding coherent node groups in directed graphs

Iiro Kumpulainen, Nikolaj Tatti

TL;DR

The paper tackles directed graphs with node features by formulating directed graph segmentation (dgs): partitioning into an ordered sequence of coherent groups while minimizing intra-group $L_2$ loss and penalizing cross-edges with forward/backward weights. It develops an exact MILP, a versatile iterative heuristic framework (dgs-partition, dgs-centroid, dgs-sort), and LP-based approximate methods with provable guarantees, plus efficient polynomial-time solutions for tree inputs and the $k=2$ case. The authors prove NP-hardness and APX-hardness results for general instances, derive a $k-1$ approximation via LP rounding, and, in the symmetric case, a $(k+1)/3$ bound, with extensive experiments showing practical performance and interpretable partitions on synthetic and real networks. The work provides a practical toolbox for structure-aware clustering of directed networks with node features, with potential extensions to other loss functions, feature types, and edge-centric models.

Abstract

Grouping the nodes of a graph into clusters is a standard technique for studying networks. We study a problem where we are given a directed network and are asked to partition the graph into a sequence of coherent groups. We assume that nodes in the network have features, and we measure the group coherence by comparing these features. Furthermore, we incorporate the cross edges by penalizing the forward cross edges and backward cross edges with different weights. If the weights are set to 0, then the problem is equivalent to clustering. However, if we penalize the backward edges, the order of discovered groups matters, and we can view our problem as a generalization of a classic segmentation problem. We consider a common iterative approach where we solve the groups given the centroids, and then find the centroids given the groups. We show that, unlike in clustering, the first subproblem is NP-hard. However, we show that we can solve the subproblem exactly if the underlying graph is a tree or if the number of groups is 2. For a general case, we propose an approximation algorithm based on linear programming. We propose 3 additional heuristics: (1) optimizing each pair of groups separately while keeping the remaining groups intact, (2) computing a spanning tree and then optimizing using only the edges in that, and (3) a greedy search moving nodes between the groups while optimizing the overall loss. We demonstrate with our experiments that the algorithms are practical and yield interpretable results.

Finding coherent node groups in directed graphs

TL;DR

The paper tackles directed graphs with node features by formulating directed graph segmentation (dgs): partitioning into an ordered sequence of coherent groups while minimizing intra-group

loss and penalizing cross-edges with forward/backward weights. It develops an exact MILP, a versatile iterative heuristic framework (dgs-partition, dgs-centroid, dgs-sort), and LP-based approximate methods with provable guarantees, plus efficient polynomial-time solutions for tree inputs and the

case. The authors prove NP-hardness and APX-hardness results for general instances, derive a

approximation via LP rounding, and, in the symmetric case, a

bound, with extensive experiments showing practical performance and interpretable partitions on synthetic and real networks. The work provides a practical toolbox for structure-aware clustering of directed networks with node features, with potential extensions to other loss functions, feature types, and edge-centric models.

Abstract

Paper Structure (13 sections, 15 theorems, 48 equations, 2 figures, 4 tables, 6 algorithms)

This paper contains 13 sections, 15 theorems, 48 equations, 2 figures, 4 tables, 6 algorithms.

Introduction
Preliminary notation and problem definition
Related work
Mixed-integer linear program solving directed graph segmentation
Iterative approach
Computational complexity
Special cases for finding the optimal partition exactly in polynomial time
Linear programming approaches for finding an approximately optimal partition
Relaxation and rounding algorithm
Linear program approach for a case when $\lambda_f = \lambda_b$
Additional heuristics for solving directed graph segmentation
Experimental evaluation
Concluding remarks

Key Result

Theorem 1

Solving Eq. eq:main_lp_objective subject to Eqs. eq:lp_start--eq:lp_d_start solves dgs. The number of variables and constraints in MILP is in $\mathit{\mathcal{O}}$.

Figures (2)

Figure 1: Algorithm average runtimes across 5 runs as a function of number of vertices (top left), number of features (top right), and number of clusters (bottom left) on synthetic graphs. Both axes are logarithmic.
Figure 2: Mean Adjusted Rand Index over 10 runs between the ground truth and the partition chosen by the algorithms as a function of the probability $p$ of reassigning vertices with new random features from a random cluster. On synthetic trees, the values for LPiter and TreeDP are overlapping.

Theorems & Definitions (28)

Theorem 1
Theorem 2
proof
Theorem 3
proof
Theorem 4
proof
Theorem 5
proof
Theorem 6
...and 18 more

Finding coherent node groups in directed graphs

TL;DR

Abstract

Finding coherent node groups in directed graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)