Table of Contents
Fetching ...

GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based Histogram Intersection

Alessandro Bicciato, Luca Cosmo, Giorgia Minello, Luca Rossi, Andrea Torsello

TL;DR

GNN-LoFI replaces standard message passing with a localized feature distribution analysis over egonets, using learned dictionaries and histograms connected by a differentiable histogram-intersection kernel. Each LoFI layer yields a vector of similarity scores across multiple masks, which are pooled and passed to an MLP for graph-level prediction, while enabling interpretability by revealing influential masks. Empirically, it achieves state-of-the-art or competitive results on graph classification and regression benchmarks with runtime comparable to traditional GNNs, illustrating the practicality of distribution-based neighborhood representations. The work broadens graph representation learning by combining local distributional information with end-to-end trainable histograms, and suggests future extensions to jointly learn feature distributions and graph structure using alternatives like Earth mover’s distance.

Abstract

Graph neural networks are increasingly becoming the framework of choice for graph-based machine learning. In this paper, we propose a new graph neural network architecture that substitutes classical message passing with an analysis of the local distribution of node features. To this end, we extract the distribution of features in the egonet for each local neighbourhood and compare them against a set of learned label distributions by taking the histogram intersection kernel. The similarity information is then propagated to other nodes in the network, effectively creating a message passing-like mechanism where the message is determined by the ensemble of the features. We perform an ablation study to evaluate the network's performance under different choices of its hyper-parameters. Finally, we test our model on standard graph classification and regression benchmarks, and we find that it outperforms widely used alternative approaches, including both graph kernels and graph neural networks.

GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based Histogram Intersection

TL;DR

GNN-LoFI replaces standard message passing with a localized feature distribution analysis over egonets, using learned dictionaries and histograms connected by a differentiable histogram-intersection kernel. Each LoFI layer yields a vector of similarity scores across multiple masks, which are pooled and passed to an MLP for graph-level prediction, while enabling interpretability by revealing influential masks. Empirically, it achieves state-of-the-art or competitive results on graph classification and regression benchmarks with runtime comparable to traditional GNNs, illustrating the practicality of distribution-based neighborhood representations. The work broadens graph representation learning by combining local distributional information with end-to-end trainable histograms, and suggests future extensions to jointly learn feature distributions and graph structure using alternatives like Earth mover’s distance.

Abstract

Graph neural networks are increasingly becoming the framework of choice for graph-based machine learning. In this paper, we propose a new graph neural network architecture that substitutes classical message passing with an analysis of the local distribution of node features. To this end, we extract the distribution of features in the egonet for each local neighbourhood and compare them against a set of learned label distributions by taking the histogram intersection kernel. The similarity information is then propagated to other nodes in the network, effectively creating a message passing-like mechanism where the message is determined by the ensemble of the features. We perform an ablation study to evaluate the network's performance under different choices of its hyper-parameters. Finally, we test our model on standard graph classification and regression benchmarks, and we find that it outperforms widely used alternative approaches, including both graph kernels and graph neural networks.
Paper Structure (16 sections, 5 equations, 6 figures, 3 tables)

This paper contains 16 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The proposed GNN-LoFI architecture. The input graph is fed into one or more LoFI layers, where the feature distributions on egonets centered at each node are compared to a set of learned histograms. The output is a new set of real-valued feature vectors associated with the graph nodes. We obtain a graph-level feature vector through pooling on the nodes features, which is then fed to an MLP to output the final classification label.
  • Figure 2: Feature-based (soft) histogram computation, where we simplify the notation by omitting the layer number $l$. (a) Given a graph $\mathcal{G}$ with 8 nodes, we extract the 1-hop egonet $\mathcal{N}_v$, a subgraph of $\mathcal{G}$ centered on the vertex $v$, where its nodes are colour-coded for clarity. (b) $D_j$ is the $j$-th learned dictionary at the current layer. It has 7 entries (words) of size matching that of the node features. $X_v$ represents the set of the 5 input node feature vectors associated to the 5 nodes of $\mathcal{N}_v$, colour-coded to match the corresponding nodes in (a). (c) For each node feature $\vectorbold{x}_u \in X_v$ we compute the normalized similarity score with respect to each dictionary token $\mathbf{w}_i \in D_j$. Here, we represent the similarity values for each feature vector $\vectorbold{x}_u$ by barplot. The height of a bar indicates the similarity of that feature vector $\vectorbold{x}_u$ to the word (words are $1,2,\dots,7$). (d) To obtain the feature (soft) histogram $h(X_v;D_j)$ of $X_v$, for each token $\mathbf{w}_i$ of the learned dictionary $D_j$, we sum the normalized similarities computed in (c) for every node feature $\vectorbold{x}_u \in X_v$.
  • Figure 3: The histogram intersection kernel at layer $l$, where we simplify the notation by omitting the layer number $l$. Given a node $v$ in $\mathcal{N}_v$ and the associated histograms $\vectorbold{h}_{v,j} = h(X_v;D_j)$, the histogram intersection operation is repeated for each mask $\mathcal{M}_j = (D_j,\vectorbold{f}_j)$, where the $j$-th mask is the pair of learned histogram $\vectorbold{f}_j$ and dictionary $D_j$. The intersection between $h(X_v;D_j)$ and $\vectorbold{f}_j$ is defined in terms of the absolute difference between the two histograms. The obtained positive real value represents the similarity between the feature distribution associated to the node $v$ and the learned histogram, which in turn accounts for the $j$-th value of the updated node feature $\mathbf{z}_v$.
  • Figure 4: Average classification accuracy on (a) NCI1 and (b) PROTEINS dataset as we vary both the egonet radius and the number of layers.
  • Figure 5: Average classification accuracy on (a) NCI1 and (b) PROTEINS dataset comparing the number of trained masks (top) and the dictionary size (bottom).
  • ...and 1 more figures