Optimality of Message-Passing Architectures for Sparse Graphs

Aseem Baranwal; Kimon Fountoulakis; Aukosh Jagannath

Optimality of Message-Passing Architectures for Sparse Graphs

Aseem Baranwal, Kimon Fountoulakis, Aukosh Jagannath

TL;DR

The paper studies node classification on extremely sparse, feature-decorated graphs in the fixed-feature regime and defines asymptotic local Bayes optimality to benchmark classifiers. It formalizes a CSBM data model with $q_{ij}=b_{ij}/n$ and arbitrary feature distributions, derives the Bayes-optimal local classifier, and proves that a decoupled, message-passing GNN architecture can implement it. Through a Gaussian-feature specialization, it characterizes generalization error and shows the optimal method interpolates between an MLP (low graph signal) and a convolutional style GCN (high graph signal), with a non-asymptotic analysis confirming robustness on finite graphs. The results imply that, in sparse graphs, MP-GNNs are theoretically optimal for node classification, guiding architecture design and offering insights into when graph information is advantageous. The work advances understanding of when and how graph structure should be leveraged in learning, and provides precise performance guarantees under local weak convergence and non-asymptotic conditions.

Abstract

We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is $O(1)$ in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. We introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. We then compute the generalization error of this classifier and compare its performance against existing learning methods theoretically on a well-studied statistical model with naturally identifiable signal-to-noise ratios (SNRs) in the data. We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical convolution in the regime of high graph signal. Furthermore, we prove a corresponding non-asymptotic result.

Optimality of Message-Passing Architectures for Sparse Graphs

TL;DR

and arbitrary feature distributions, derives the Bayes-optimal local classifier, and proves that a decoupled, message-passing GNN architecture can implement it. Through a Gaussian-feature specialization, it characterizes generalization error and shows the optimal method interpolates between an MLP (low graph signal) and a convolutional style GCN (high graph signal), with a non-asymptotic analysis confirming robustness on finite graphs. The results imply that, in sparse graphs, MP-GNNs are theoretically optimal for node classification, guiding architecture design and offering insights into when graph information is advantageous. The work advances understanding of when and how graph structure should be leveraged in learning, and provides precise performance guarantees under local weak convergence and non-asymptotic conditions.

Abstract

We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is

in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. We introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. We then compute the generalization error of this classifier and compare its performance against existing learning methods theoretically on a well-studied statistical model with naturally identifiable signal-to-noise ratios (SNRs) in the data. We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical convolution in the regime of high graph signal. Furthermore, we prove a corresponding non-asymptotic result.

Paper Structure (21 sections, 17 theorems, 46 equations, 2 figures)

This paper contains 21 sections, 17 theorems, 46 equations, 2 figures.

Introduction
Related Work.
Our Contributions.
Architecture
Theoretical Analysis and Discussion
Asymptotic Local Bayes Optimality
Data Model
Optimal Classifier
Comparative Study
Non-asymptotic Setting
Proofs
Preliminary Results
Bayes Optimal Classifier
Computing the Classifier
Generalization Error
...and 6 more sections

Key Result

Theorem 1

For any $\ell\geq 1$, the asymptotically $\ell$-locally Bayes optimal classifier of the root for the sequence $(G_n,u_n)\sim \mathrm{CSBM}(n,d,\mathds{P},\mathbf{Q})$ is where $\{\rho_i\}_{i\in[C]}$ are the densities associated with the distributions $\mathds{P}_i\in \mathds{P}$, and $\boldsymbol{\rho} = (\rho_i)_{i\in[C]}$.

Figures (2)

Figure 1: Comparison of \ref{['arch']} against an MLP and a vanilla GCN kipf:gcn.
Figure 2: Demonstration of \ref{['thm:gen-error-extremes']} for extreme graph signals. In the case where $\Gamma=0$, the architecture reduces to an MLP (\ref{['fig:Gamma-0']}), while if $\Gamma=1$, it behaves the same as a GCN (\ref{['fig:Gamma-1']}).

Theorems & Definitions (29)

Definition 3.1: $\ell$-local classifier
Definition 3.2
Theorem 1: Bayes optimal message-passing
Corollary 1.1: Optimal classifier for binary symmetric CSBM
Theorem 2: Generalization error
Theorem 3: Extreme graph signals
Proposition 3.1: Tree neighbourhoods
Theorem 4: Misclassification error for fixed $n$
Lemma 4.1
Lemma 4.2
...and 19 more

Optimality of Message-Passing Architectures for Sparse Graphs

TL;DR

Abstract

Optimality of Message-Passing Architectures for Sparse Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (29)