Table of Contents
Fetching ...

Improving Quaternion Neural Networks with Quaternionic Activation Functions

Johannes Pöppelbaum, Andreas Schwung

TL;DR

This work addresses the suboptimality of elementwise quaternion activations by introducing magnitude- and phase-aware quaternion activations that respect the quaternion space $\mathbb{H}$. It defines design criteria, derives gradients via the GHR calculus, and develops eight activations (e.g., MagnitudeTanh, Quaternion Cardioid, PhaseTanh, PhaseSin) that leverage the polar representation of quaternions. Experimental results on CIFAR-10 and SVHN with QVGG-S/11/16 show phase-based activations consistently outperform split activations, often with PhaseSin achieving the best scores, and two angle definitions ($\psi$ vs $\theta$) are analyzed. The findings demonstrate improved gradient flow, interpretability, and performance, suggesting broad applicability of quaternion-aware activations in QNNs and guiding future quaternion activation design.

Abstract

In this paper, we propose novel quaternion activation functions where we modify either the quaternion magnitude or the phase, as an alternative to the commonly used split activation functions. We define criteria that are relevant for quaternion activation functions, and subsequently we propose our novel activation functions based on this analysis. Instead of applying a known activation function like the ReLU or Tanh on the quaternion elements separately, these activation functions consider the quaternion properties and respect the quaternion space $\mathbb{H}$. In particular, all quaternion components are utilized to calculate all output components, carrying out the benefit of the Hamilton product in e.g. the quaternion convolution to the activation functions. The proposed activation functions can be incorporated in arbitrary quaternion valued neural networks trained with gradient descent techniques. We further discuss the derivatives of the proposed activation functions where we observe beneficial properties for the activation functions affecting the phase. Specifically, they prove to be sensitive on basically the whole input range, thus improved gradient flow can be expected. We provide an elaborate experimental evaluation of our proposed quaternion activation functions including comparison with the split ReLU and split Tanh on two image classification tasks using the CIFAR-10 and SVHN dataset. There, especially the quaternion activation functions affecting the phase consistently prove to provide better performance.

Improving Quaternion Neural Networks with Quaternionic Activation Functions

TL;DR

This work addresses the suboptimality of elementwise quaternion activations by introducing magnitude- and phase-aware quaternion activations that respect the quaternion space . It defines design criteria, derives gradients via the GHR calculus, and develops eight activations (e.g., MagnitudeTanh, Quaternion Cardioid, PhaseTanh, PhaseSin) that leverage the polar representation of quaternions. Experimental results on CIFAR-10 and SVHN with QVGG-S/11/16 show phase-based activations consistently outperform split activations, often with PhaseSin achieving the best scores, and two angle definitions ( vs ) are analyzed. The findings demonstrate improved gradient flow, interpretability, and performance, suggesting broad applicability of quaternion-aware activations in QNNs and guiding future quaternion activation design.

Abstract

In this paper, we propose novel quaternion activation functions where we modify either the quaternion magnitude or the phase, as an alternative to the commonly used split activation functions. We define criteria that are relevant for quaternion activation functions, and subsequently we propose our novel activation functions based on this analysis. Instead of applying a known activation function like the ReLU or Tanh on the quaternion elements separately, these activation functions consider the quaternion properties and respect the quaternion space . In particular, all quaternion components are utilized to calculate all output components, carrying out the benefit of the Hamilton product in e.g. the quaternion convolution to the activation functions. The proposed activation functions can be incorporated in arbitrary quaternion valued neural networks trained with gradient descent techniques. We further discuss the derivatives of the proposed activation functions where we observe beneficial properties for the activation functions affecting the phase. Specifically, they prove to be sensitive on basically the whole input range, thus improved gradient flow can be expected. We provide an elaborate experimental evaluation of our proposed quaternion activation functions including comparison with the split ReLU and split Tanh on two image classification tasks using the CIFAR-10 and SVHN dataset. There, especially the quaternion activation functions affecting the phase consistently prove to provide better performance.

Paper Structure

This paper contains 31 sections, 31 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Visualization of the quaternion meshgrid used as an input for the visualization of the quaternion activation functions.
  • Figure 2: Visualization of the Norm activation.
  • Figure 3: Visualization of the MagnitudeTanh.
  • Figure 4: Visualization of the QuaternionCardioid.
  • Figure 5: Visualization of the PhaseTanh and PhaseTanhshrink.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Remark
  • Remark
  • Remark
  • Remark