Table of Contents
Fetching ...

Graph as a feature: improving node classification with non-neural graph-aware logistic regression

Simon Delarue, Thomas Bonald, Tiphaine Viard

TL;DR

Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks, is introduced and shows that the proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks.

Abstract

Graph Neural Networks (GNNs) and their message passing framework that leverages both structural and feature information, have become a standard method for solving graph-based machine learning problems. However, these approaches still struggle to generalise well beyond datasets that exhibit strong homophily, where nodes of the same class tend to connect. This limitation has led to the development of complex neural architectures that pose challenges in terms of efficiency and scalability. In response to these limitations, we focus on simpler and more scalable approaches and introduce Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities. However instead of relying on message passing, our approach encodes each node's relationships as an additional feature vector, which is then combined with the node's self attributes. Extensive experimental results, conducted within a rigorous evaluation framework, show that our proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks. Going beyond the traditional limited benchmarks, our experiments indicate that GLR increases generalisation ability while reaching performance gains in computation time up to two orders of magnitude compared to it best neural competitor.

Graph as a feature: improving node classification with non-neural graph-aware logistic regression

TL;DR

Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks, is introduced and shows that the proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks.

Abstract

Graph Neural Networks (GNNs) and their message passing framework that leverages both structural and feature information, have become a standard method for solving graph-based machine learning problems. However, these approaches still struggle to generalise well beyond datasets that exhibit strong homophily, where nodes of the same class tend to connect. This limitation has led to the development of complex neural architectures that pose challenges in terms of efficiency and scalability. In response to these limitations, we focus on simpler and more scalable approaches and introduce Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities. However instead of relying on message passing, our approach encodes each node's relationships as an additional feature vector, which is then combined with the node's self attributes. Extensive experimental results, conducted within a rigorous evaluation framework, show that our proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks. Going beyond the traditional limited benchmarks, our experiments indicate that GLR increases generalisation ability while reaching performance gains in computation time up to two orders of magnitude compared to it best neural competitor.

Paper Structure

This paper contains 34 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of our Graph-aware Logistic Regression (GLR) method for node classification. We consider both the topological and attribute information by concatenating ($||$) the graph adjacency and feature matrices. We then feed the result into a logistic regression $f$.
  • Figure 2: Test average accuracy (and standard deviation) for the best GNN and non-neural baselines.
  • Figure 3: Tradeoff between accuracy and computation time. Notice the log scale for the $y$-axis.
  • Figure 4: Node label and feature homophily distributions across graphs. Width of the violins is scaled by the number of nodes in the graph.
  • Figure 5: Cumulative node degree distributions.
  • ...and 1 more figures