Table of Contents
Fetching ...

Node classification in networks via simplicial interactions

Eunho Koo, Tongseok Lim

TL;DR

This work addresses node classification by exploiting higher-order interactions in networks. It introduces a simplicial objective function that penalizes label diversity within higher-order cliques and pairs it with a novel Stochastic Block Tensor Model (SBTM) to generate graphs with realistic higher-order motifs. The authors show that incorporating higher-order structures yields improved classification performance, especially under challenging conditions such as low homo-connection probability and limited prior labels, and that combining the objective with GNN-based methods provides additional gains. Practically, the approach enables more accurate community detection in networks where higher-order interactions are prevalent, while also highlighting computational considerations and avenues for parallelization and extension to larger label sets.

Abstract

In the node classification task, it is natural to presume that densely connected nodes tend to exhibit similar attributes. Given this, it is crucial to first define what constitutes a dense connection and to develop a reliable mathematical tool for assessing node cohesiveness. In this paper, we propose a probability-based objective function for semi-supervised node classification that takes advantage of higher-order networks' capabilities. The proposed function reflects the philosophy aligned with the intuition behind classifying within higher order networks, as it is designed to reduce the likelihood of nodes interconnected through higher-order networks bearing different labels. Additionally, we propose the Stochastic Block Tensor Model (SBTM) as a graph generation model designed specifically to address a significant limitation of the traditional stochastic block model, which does not adequately represent the distribution of higher-order structures in real networks. We evaluate the objective function using networks generated by the SBTM, which include both balanced and imbalanced scenarios. Furthermore, we present an approach that integrates the objective function with graph neural network (GNN)-based semi-supervised node classification methodologies, aiming for additional performance gains. Our results demonstrate that in challenging classification scenarios--characterized by a low probability of homo-connections, a high probability of hetero-connections, and limited prior node information--models based on the higher-order network outperform pairwise interaction-based models. Furthermore, experimental results suggest that integrating our proposed objective function with existing GNN-based node classification approaches enhances classification performance by efficiently learning higher-order structures distributed in the network.

Node classification in networks via simplicial interactions

TL;DR

This work addresses node classification by exploiting higher-order interactions in networks. It introduces a simplicial objective function that penalizes label diversity within higher-order cliques and pairs it with a novel Stochastic Block Tensor Model (SBTM) to generate graphs with realistic higher-order motifs. The authors show that incorporating higher-order structures yields improved classification performance, especially under challenging conditions such as low homo-connection probability and limited prior labels, and that combining the objective with GNN-based methods provides additional gains. Practically, the approach enables more accurate community detection in networks where higher-order interactions are prevalent, while also highlighting computational considerations and avenues for parallelization and extension to larger label sets.

Abstract

In the node classification task, it is natural to presume that densely connected nodes tend to exhibit similar attributes. Given this, it is crucial to first define what constitutes a dense connection and to develop a reliable mathematical tool for assessing node cohesiveness. In this paper, we propose a probability-based objective function for semi-supervised node classification that takes advantage of higher-order networks' capabilities. The proposed function reflects the philosophy aligned with the intuition behind classifying within higher order networks, as it is designed to reduce the likelihood of nodes interconnected through higher-order networks bearing different labels. Additionally, we propose the Stochastic Block Tensor Model (SBTM) as a graph generation model designed specifically to address a significant limitation of the traditional stochastic block model, which does not adequately represent the distribution of higher-order structures in real networks. We evaluate the objective function using networks generated by the SBTM, which include both balanced and imbalanced scenarios. Furthermore, we present an approach that integrates the objective function with graph neural network (GNN)-based semi-supervised node classification methodologies, aiming for additional performance gains. Our results demonstrate that in challenging classification scenarios--characterized by a low probability of homo-connections, a high probability of hetero-connections, and limited prior node information--models based on the higher-order network outperform pairwise interaction-based models. Furthermore, experimental results suggest that integrating our proposed objective function with existing GNN-based node classification approaches enhances classification performance by efficiently learning higher-order structures distributed in the network.
Paper Structure (15 sections, 9 equations, 5 figures, 2 tables)

This paper contains 15 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overall optimization process. First, we classify $k$-simplices (for $k=1,2,…$) in a given network based on their sizes (Breaking Down). Then, we multiply each size category by a distinct multinomial coefficient, sum these weighted values, and use this combined information to train a model (Training). Finally, we evaluate the model's classification performance (Classification).
  • Figure 2: Accuracy gains for balanced SI-3, SI-4 and SI-5 experiments with respect to variations in homo-connection probability (left), hetero-connection probability (middle), and prior information ratio in $N$=[200,200,200,200,200] setting. In left, diagonal of $B_3$, $B_4$, and $B_5$ varies from 0.01 to 0.05, from 0.05 to 0.10, and from 0.12 to 0.17, respectively. In middle, off-diagonal of $B_3$, $B_4$, and $B_5$ varies from 0.001 to 0.005, from 0.005 to 0.010, and from 0.012 to 0.017, respectively. In right, the performances are tested ranging from 10 known nodes (= prior information ratio 0.01) to 90 known nodes (PIR 0.09).
  • Figure 3: Accuracy gains for imbalanced SI-3, SI-4 and SI-5 experiments with respect to variations in homo-connection probability (left), hetero-connection probability (middle), and prior information ratio in $N$=[300,300,100,100,100,100] setting. In left, diagonal of $B_3$, $B_4$ and $B_5$ varies from 0.03 & 0.06 to 0.04 & 0.08 (that is, the diagonal entries of $B_3$ varies from [0.03, 0.03, 0.06, 0.06, 0.06, 0.06] to [0.04, 0.04, 0.08, 0.08, 0.08, 0,08]), from 0.04 & 0.08 to 0.07 & 0.14, and from 0.11 & 0.17 to 0.14 & 0.23, respectively. In middle, off-diagonal of $B_3$, $B_4$ and $B_5$ varies from 0.006 to 0.008, from 0.008 to 0.014, and from 0.017 to 0.023. In right, the performances are tested ranging from 10 known nodes (= prior information ratio 0.01) to 90 known nodes (PIR 0.09).
  • Figure 4: Accuracy gains on $w_k=\alpha^{k-1}$ in the objective \ref{['objective']} for imbalanced SI-5 experiments with respect to variations in homo-connection probability (left), hetero-connection probability (middle), and prior information ratio in $N$=[300,300,100,100,100,100] setting. In left, diagonal of $B_5$ varies from [0.11, 0.11, 0.17, 0.17, 0.17, 0.17] to [0.14, 0.14, 0.23, 0.23, 0.23, 0,23]. In middle, off-diagonal of $B_5$ varies from 0.017 to 0.023. In right, the performances are tested ranging from 10 known nodes (= prior information ratio 0.01) to 90 known nodes (PIR 0.09).
  • Figure :

Theorems & Definitions (2)

  • Example 1
  • Example 2