Table of Contents
Fetching ...

DiGRAF: Diffeomorphic Graph-Adaptive Activation Function

Krishna Sri Ipsit Mantri, Xinzhi Wang, Carola-Bibiane Schönlieb, Bruno Ribeiro, Beatrice Bevilacqua, Moshe Eliasof

TL;DR

DiGRAF is introduced, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which it is augmented with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner.

Abstract

In this paper, we propose a novel activation function tailored specifically for graph data in Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible activation functions, we introduce DiGRAF, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which we augment with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner. In addition to its graph-adaptivity and flexibility, DiGRAF also possesses properties that are widely recognized as desirable for activation functions, such as differentiability, boundness within the domain, and computational efficiency. We conduct an extensive set of experiments across diverse datasets and tasks, demonstrating a consistent and superior performance of DiGRAF compared to traditional and graph-specific activation functions, highlighting its effectiveness as an activation function for GNNs. Our code is available at https://github.com/ipsitmantri/DiGRAF.

DiGRAF: Diffeomorphic Graph-Adaptive Activation Function

TL;DR

DiGRAF is introduced, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which it is augmented with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner.

Abstract

In this paper, we propose a novel activation function tailored specifically for graph data in Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible activation functions, we introduce DiGRAF, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which we augment with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner. In addition to its graph-adaptivity and flexibility, DiGRAF also possesses properties that are widely recognized as desirable for activation functions, such as differentiability, boundness within the domain, and computational efficiency. We conduct an extensive set of experiments across diverse datasets and tasks, demonstrating a consistent and superior performance of DiGRAF compared to traditional and graph-specific activation functions, highlighting its effectiveness as an activation function for GNNs. Our code is available at https://github.com/ipsitmantri/DiGRAF.
Paper Structure (54 sections, 6 theorems, 40 equations, 8 figures, 16 tables)

This paper contains 54 sections, 6 theorems, 40 equations, 8 figures, 16 tables.

Key Result

Proposition 4.0

Given a bounded domain $\Omega=[a,b] \subset \mathbb{R}$ where $a<b$, and any two arbitrary points $x, y \in \Omega$, the maximal difference of a diffeomorphism $T(\cdot; {\bm{\theta}}^{(l)})$ with parameter ${\bm{\theta}}^{(l)}$ in DiGRAF is bounded as follows: where $C_{v^{{\bm{\theta}}^{(l)}}}$ is the Lipschitz constant of the CPA velocity field $v^{{\bm{\theta}}^{(l)}}$.

Figures (8)

  • Figure 1: Illustration of DiGRAF. Node features $\mathbf{H}^{(l-1)}$ and adjacency matrix $\mathbf{A}$ are fed to a $\textsc{GNN}_{\textsc{layer}}^{(l)}$ to obtain updated intermediate node features $\bar{\mathbf{H}}^{(l)}$, which are passed to our activation function layer, DiGRAF. First, an additional GNN network $\textsc{GNN}_{\textsc{act}}$ takes $\bar{\mathbf{H}}^{(l)}$ and $\mathbf{A}$ as input to determine the activation function parameters ${\bm{\theta}}^{(l)}$. These are used to parameterize the transformation $T^{(l)}$, which operates on $\bar{\mathbf{H}}^{(l)}$ to produce the activated node features $\mathbf{H}^{(l)}$.
  • Figure 2: Approximation of traditional activation functions using CPAB and Piecewise ReLU with varying segment counts $K \in \{1, 2, 3\}$ on a closed interval $\Omega=[-5,5]$, demonstrating the advantage of utilizing CPAB and its flexibility to model various activation functions.
  • Figure 3: An example of CPA velocity fields $v^{{\bm{\theta}}}$ defined on the interval $\Omega = [-5, 5]$ with a tessellation ${\mathcal{P}}$ consisting of five subintervals. The three different parameters, ${\bm{\theta}}_1$, ${\bm{\theta}}_2$, and ${\bm{\theta}}_3$ define three distinct CPA velocity fields (\ref{['fig:velocity_field:velocity']}) resulting in separate CPAB diffeomorphisms $f^{\bm{\theta}}(x)$ (\ref{['fig:velocity_field:diffeomorphism']}).
  • Figure 4: Different transformation strategies. The input function (red), CPAB transformation (blue), and DiGRAF transformation (green), within $\Omega = [-5, 5]$ using the same ${\bm{\theta}}$. While CPAB stretches the input, DiGRAF stretches the output, showcasing the distinctive impact of each approach.
  • Figure 5: Convergence analysis of DiGRAF compared to baseline activation functions. The plot illustrates the training loss over epochs, showcasing the overall faster convergence of DiGRAF.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 3.1: Diffeomorphism on a closed interval $\Omega$
  • Definition 3.2: CPA velocity field $v^{{\bm{\theta}}}$ on $\Omega$
  • Definition 3.3: CPAB Diffeomorphism
  • Proposition 4.0: The boundedness of $T(\cdot; \vtheta^{(l)})$ in
  • Definition C.1: Tessellation of a closed interval freifeld2015highly
  • Definition C.2: Relation between ${\bm{\theta}}$ and ${v}^{{\bm{\theta}}}$, taken from freifeld2017transformations
  • Proposition C.3: DiGRAF has a closed form solution
  • proof
  • Proposition D.1: The Lipschitz Constant of $v^{{\bm{\theta}}}$
  • proof
  • ...and 7 more