Table of Contents
Fetching ...

Convexified Message-Passing Graph Neural Networks

Saar Cohen, Noa Agmon, Uri Shaham

TL;DR

CGNNs address the non-convex training challenge of message-passing GNNs by mapping nonlinear graph filters into a reproducing kernel Hilbert space, turning learning into a convex optimization problem solvable with projected gradient methods. The framework provides a theoretically rigorous generalization bound that asymptotically matches the best possible GCN, and extends to deeper architectures via layer-wise training. By kernelizing nonlinear activations and relaxing low-rank constraints with nuclear-norm penalties, CGNNs retain the standard GCN modularity while enabling scalable, convex optimization and robust performance. Empirically, CGNNs achieve substantial accuracy gains (often 10–40%) over their non-convex counterparts across diverse graph benchmarks, while maintaining robustness in data-scarce regimes and avoiding heavy over-parameterization.

Abstract

Graph Neural Networks (GNNs) are key tools for graph representation learning, demonstrating strong results across diverse prediction tasks. In this paper, we present Convexified Message-Passing Graph Neural Networks (CGNNs), a novel and general framework that combines the power of message-passing GNNs with the tractability of convex optimization. By mapping their nonlinear filters into a reproducing kernel Hilbert space, CGNNs transform training into a convex optimization problem, which projected gradient methods can solve both efficiently and optimally. Convexity further allows CGNNs' statistical properties to be analyzed accurately and rigorously. For two-layer CGNNs, we establish rigorous generalization guarantees, showing convergence to the performance of an optimal GNN. To scale to deeper architectures, we adopt a principled layer-wise training strategy. Experiments on benchmark datasets show that CGNNs significantly exceed the performance of leading GNN models, obtaining 10-40% higher accuracy in most cases, underscoring their promise as a powerful and principled method with strong theoretical foundations. In rare cases where improvements are not quantitatively substantial, the convex models either slightly exceed or match the baselines, stressing their robustness and wide applicability. Though over-parameterization is often used to enhance performance in non-convex models, we show that our CGNNs yield shallow convex models that can surpass non-convex ones in accuracy and model compactness.

Convexified Message-Passing Graph Neural Networks

TL;DR

CGNNs address the non-convex training challenge of message-passing GNNs by mapping nonlinear graph filters into a reproducing kernel Hilbert space, turning learning into a convex optimization problem solvable with projected gradient methods. The framework provides a theoretically rigorous generalization bound that asymptotically matches the best possible GCN, and extends to deeper architectures via layer-wise training. By kernelizing nonlinear activations and relaxing low-rank constraints with nuclear-norm penalties, CGNNs retain the standard GCN modularity while enabling scalable, convex optimization and robust performance. Empirically, CGNNs achieve substantial accuracy gains (often 10–40%) over their non-convex counterparts across diverse graph benchmarks, while maintaining robustness in data-scarce regimes and avoiding heavy over-parameterization.

Abstract

Graph Neural Networks (GNNs) are key tools for graph representation learning, demonstrating strong results across diverse prediction tasks. In this paper, we present Convexified Message-Passing Graph Neural Networks (CGNNs), a novel and general framework that combines the power of message-passing GNNs with the tractability of convex optimization. By mapping their nonlinear filters into a reproducing kernel Hilbert space, CGNNs transform training into a convex optimization problem, which projected gradient methods can solve both efficiently and optimally. Convexity further allows CGNNs' statistical properties to be analyzed accurately and rigorously. For two-layer CGNNs, we establish rigorous generalization guarantees, showing convergence to the performance of an optimal GNN. To scale to deeper architectures, we adopt a principled layer-wise training strategy. Experiments on benchmark datasets show that CGNNs significantly exceed the performance of leading GNN models, obtaining 10-40% higher accuracy in most cases, underscoring their promise as a powerful and principled method with strong theoretical foundations. In rare cases where improvements are not quantitatively substantial, the convex models either slightly exceed or match the baselines, stressing their robustness and wide applicability. Though over-parameterization is often used to enhance performance in non-convex models, we show that our CGNNs yield shallow convex models that can surpass non-convex ones in accuracy and model compactness.

Paper Structure

This paper contains 43 sections, 9 theorems, 46 equations, 4 figures, 4 tables, 2 algorithms.

Key Result

Lemma 4.1

Assume that the filters $\mathcal{A}_{\ell}$ at each layer $\ell$ satisfy eq:bound cons and eq:rank cons. Then, $\mathcal{A}_{\ell}$'s nuclear norm is upper bounded as $\|\mathcal{A}_{\ell}\|_* \leq \mathcal{B}_\ell$ for some $\mathcal{B}_\ell>0$ that is a function of $R_\ell , F_\ell, F_{\ell-1}, K

Figures (4)

  • Figure 1: Convexification Procedure of a message-passing GNN into a CGNN.
  • Figure 2: Heatmap of classification accuracy (%) across all models and datasets. Rows correspond to models, columns to datasets. The names of convex models are bolded to emphasize their performance.
  • Figure 3: Accuracy comparison of two-layer convex GNNs (prefixed with ‘C’) against their standard two-layer and six-layer non-convex counterparts across five graph classification datasets. Bars represent the mean accuracy over four runs, and error bars denote the standard deviation. Datasets are ordered from left to right by increasing size.
  • Figure 4: Radar plots showing the performance of each two-layer convex model across five graph classification datasets. Each axis corresponds to a dataset, with higher values indicating better classification accuracy. These plots illustrate the consistency and robustness of convex models across diverse graph types.

Theorems & Definitions (21)

  • Remark 3.1
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Remark 4.4
  • Remark 4.5
  • Remark 4.6
  • Theorem 4.7
  • proof
  • proof
  • ...and 11 more