Table of Contents
Fetching ...

GLL: A Differentiable Graph Learning Layer for Neural Networks

Jason Brown, Bohan Chen, Harris Hardiman-Mostow, Jeff Calder, Andrea L. Bertozzi

TL;DR

This work introduces the Graph Learning Layer (GLL), a differentiable layer that integrates graph-based label propagation directly into neural network training. By deriving exact adjoint-based backpropagation through graph Laplace equations and similarity-graph construction, GLL jointly learns feature representations while performing graph-informed classification, replacing the traditional MLP head and softmax. The approach yields smoother embeddings, improved generalization, and notably stronger adversarial robustness across datasets and architectures, particularly at low label rates. Extensive experiments, including large-scale CIFAR-10 and EMNIST and ablations on toy and over-parameterized models, demonstrate the practical viability and benefits of end-to-end differentiable graph learning in deep networks.

Abstract

Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to leverage the relational information between samples for generating label predictions. In recent works, graph-based learning techniques, namely Laplace learning, have been heuristically combined with neural networks for both supervised and semi-supervised learning (SSL) tasks. However, prior works approximate the gradient of the loss function with respect to the graph learning algorithm or decouple the processes; end-to-end integration with neural networks is not achieved. In this work, we derive backpropagation equations, via the adjoint method, for inclusion of a general family of graph learning layers into a neural network. The resulting method, distinct from graph neural networks, allows us to precisely integrate similarity graph construction and graph Laplacian-based label propagation into a neural network layer, replacing a projection head and softmax activation function for general classification task. Our experimental results demonstrate smooth label transitions across data, improved generalization and robustness to adversarial attacks, and improved training dynamics compared to a standard softmax-based approach.

GLL: A Differentiable Graph Learning Layer for Neural Networks

TL;DR

This work introduces the Graph Learning Layer (GLL), a differentiable layer that integrates graph-based label propagation directly into neural network training. By deriving exact adjoint-based backpropagation through graph Laplace equations and similarity-graph construction, GLL jointly learns feature representations while performing graph-informed classification, replacing the traditional MLP head and softmax. The approach yields smoother embeddings, improved generalization, and notably stronger adversarial robustness across datasets and architectures, particularly at low label rates. Extensive experiments, including large-scale CIFAR-10 and EMNIST and ablations on toy and over-parameterized models, demonstrate the practical viability and benefits of end-to-end differentiable graph learning in deep networks.

Abstract

Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to leverage the relational information between samples for generating label predictions. In recent works, graph-based learning techniques, namely Laplace learning, have been heuristically combined with neural networks for both supervised and semi-supervised learning (SSL) tasks. However, prior works approximate the gradient of the loss function with respect to the graph learning algorithm or decouple the processes; end-to-end integration with neural networks is not achieved. In this work, we derive backpropagation equations, via the adjoint method, for inclusion of a general family of graph learning layers into a neural network. The resulting method, distinct from graph neural networks, allows us to precisely integrate similarity graph construction and graph Laplacian-based label propagation into a neural network layer, replacing a projection head and softmax activation function for general classification task. Our experimental results demonstrate smooth label transitions across data, improved generalization and robustness to adversarial attacks, and improved training dynamics compared to a standard softmax-based approach.

Paper Structure

This paper contains 46 sections, 3 theorems, 95 equations, 12 figures, 10 tables.

Key Result

Theorem 5

Assume $z\mapsto \sigma(z,x)$ and $q\mapsto \phi(q,x,y)$ are continuously differentiable for all $x,y\in {\mathcal{X}}$. Let $f\in {\ell^{2}({\mathcal{X}})}$, $g:\mathcal{L}\to \mathbb{R}$, and suppose $u\in {\ell^{2}({\mathcal{X}})}$ satisfies eq:elliptic_pde. If the adjoint equation eq:adjoint_pde

Figures (12)

  • Figure 1: Visualization of the graph learning layer (GLL) within a neural network pipeline compared to a standard multilayer perceptron (MLP) projection head. For the GLL, a combination of labeled and unlabeled input images are batched and normalized into data for the feature extractor. The feature network encodes the images in feature space. The GLL is a combination of two steps; it generates a similarity graph from the encoded data and then propagates the labels across the graph, giving predictions on the unlabeled data. These predictions are used in the desired loss function and gradients can flow through the GLL to update the feature extractor. By contrast, the MLP projection head pipeline does not involve any initially labeled nodes and the classification method ignores relational information in the encodings. The $^+$ indicates learnable parameters in the network; note GLL has none.
  • Figure 2: Comparison between the classic structure and our proposed graph learning layer (GLL) neural network structures for image classification. Both structures consist of a feature encoder and a classifier, but we replace the multilayer perceptron (MLP) classifier - consisting of one or more linear and activation layers and a final softmax classifier - with our data-dependent GLL with no trainable parameters. We reserve a subsample of the training data - denoted base samples - to use as the labeled nodes for label propagation. The samples that are not from the base set are use to compute the loss.
  • Figure 3: The initial embedding of the two moons dataset from the random seed. The blue and red correspond to the different classes, with the starred samples corresponding to the base samples for the respective classes.
  • Figure 4: Visualization of effect of $\tau$ on spatial embeddings. Each row corresponds to a graph learning model with a specified value of $\tau$ or a softmax classification head. The columns correspond to the epoch of the training. With low values of $\tau$, the model separates the data by class and resembles the original two moons shape. Increasing $\tau$ forces tighter clustering and localization. The softmax layer simply learns a linear separation with no regard to relational information. Further training with the softmax layer increases the scale, which is not seen without the axes.
  • Figure 5: Training loss and test accuracy of four training strategies on the FashionMNIST dataset with the customized feature encoder in Table \ref{['tab:structure_mlp_gll']}. The strategies are: (1) MLP trained from scratch, (2) GLL trained from scratch (GLL-0), (3) GLL-50 (GLL after 50 epochs of MLP training), and (4) GLL-75 (GLL after 75 epochs of MLP training). For the MLP-only strategy, both the MLP and GLL losses and accuracies are recorded during training; for the remaining strategies, only the GLL classifier is used.
  • ...and 7 more figures

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Remark 4
  • Theorem 5
  • Lemma 6
  • Remark 7
  • Example 2
  • Lemma 8
  • ...and 4 more