Semiring Activation in Neural Networks

Bart M. N. Smets; Peter D. Donker; Jim W. Portegies

Semiring Activation in Neural Networks

Bart M. N. Smets, Peter D. Donker, Jim W. Portegies

TL;DR

A class of trainable nonlinear operators based on semirings that are suitable for use in neural networks are introduced, which generalize the traditional alternation of linear operators with activation functions in neural networks.

Abstract

We introduce a class of trainable nonlinear operators based on semirings that are suitable for use in neural networks. These operators generalize the traditional alternation of linear operators with activation functions in neural networks. Semirings are algebraic structures that describe a generalised notation of linearity, greatly expanding the range of trainable operators that can be included in neural networks. In fact, max- or min-pooling operations are convolutions in the tropical semiring with a fixed kernel. We perform experiments where we replace the activation functions for trainable semiring-based operators to show that these are viable operations to include in fully connected as well as convolutional neural networks (ConvNeXt). We discuss some of the challenges of replacing traditional activation functions with trainable semiring activations and the trade-offs of doing so.

Semiring Activation in Neural Networks

TL;DR

Abstract

Paper Structure (24 sections, 32 equations, 2 figures, 4 tables)

This paper contains 24 sections, 32 equations, 2 figures, 4 tables.

Introduction
Trainable activation functions.
Non-standard neurons.
Morphological.
Our approach.
Quasilinear operators from semirings
Logarithmic and tropical semirings
Logarithmic semirings.
Tropical semirings.
In fully connected networks
Architectures
Training
Optimizer & learning rate scheduler.
Initialization
Tropical.
...and 9 more sections

Figures (2)

Figure 1: Network architecture for our fully connected experiments. The Head and Stem modules are linear modules. None of the modules include biases. The layer normalization modules include affine transforms. The number of input features $n$ and number of output classes $c$ are dataset dependent, the internal width parameter $w$ is chosen per experiment. Each network under consideration has the exact same number of parameters per experiment.
Figure 2: Standard and semiring-based ConvNeXt liu2022convnet blocks compared. Normalization and dropout modules are omitted.

Theorems & Definitions (3)

Definition 1: Semiring
Remark 2: Semimodules and their homomorphisms
Remark 3

Semiring Activation in Neural Networks

TL;DR

Abstract

Semiring Activation in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (3)