Table of Contents
Fetching ...

Activation functions enabling the addition of neurons and layers without altering outcomes

Sergio López-Ureña

TL;DR

This work tackles the problem of expanding neural networks without changing their outputs by introducing activation functions that are refinable and sum the identity, enabling two key operations: widening a layer via neuron subdivision and inserting a new layer between existing layers. The approach is grounded in subdivision theory, constructing activations from basic limit functions of convergent schemes, notably spline activations like $\sigma_{B^d}$ with refinability $A=d+1$, $ au=d/2$, and the identity-summing property on a suitable interval. It also extends to general subdivision-based activations, providing explicit frameworks and pseudocode (Appendix A) for practical implementation. The results offer a principled, parameter-efficient path to function-preserving architecture growth with potential benefits for multi-level training and structural learning, while outlining open questions about higher-order schemes and closed-form derivatives for backpropagation.

Abstract

In this work, we propose activation functions for neuronal networks that are refinable and sum the identity. This new class of activation functions allows the insertion of new layers between existing ones and/or the increase of neurons in a layer, both without altering the network outputs. Our approach is grounded in subdivision theory. The proposed activation functions are constructed from basic limit functions of convergent subdivision schemes. As a showcase of our results, we introduce a family of spline activation functions and provide comprehensive details for their practical implementation.

Activation functions enabling the addition of neurons and layers without altering outcomes

TL;DR

This work tackles the problem of expanding neural networks without changing their outputs by introducing activation functions that are refinable and sum the identity, enabling two key operations: widening a layer via neuron subdivision and inserting a new layer between existing layers. The approach is grounded in subdivision theory, constructing activations from basic limit functions of convergent schemes, notably spline activations like with refinability , , and the identity-summing property on a suitable interval. It also extends to general subdivision-based activations, providing explicit frameworks and pseudocode (Appendix A) for practical implementation. The results offer a principled, parameter-efficient path to function-preserving architecture growth with potential benefits for multi-level training and structural learning, while outlining open questions about higher-order schemes and closed-form derivatives for backpropagation.

Abstract

In this work, we propose activation functions for neuronal networks that are refinable and sum the identity. This new class of activation functions allows the insertion of new layers between existing ones and/or the increase of neurons in a layer, both without altering the network outputs. Our approach is grounded in subdivision theory. The proposed activation functions are constructed from basic limit functions of convergent subdivision schemes. As a showcase of our results, we introduce a family of spline activation functions and provide comprehensive details for their practical implementation.

Paper Structure

This paper contains 9 sections, 7 theorems, 72 equations, 1 figure, 3 algorithms.

Key Result

Theorem 3

Let $\sigma^0, \sigma^1$ be activation functions, being $\sigma^0$ refinable as in eq_refinable. Given $W^{0}\in \mathds{R}^{n_1\times n_{0}}$, $W^1\in\mathds{R}^{n_{2}\times n_{1}}$ and $b^{0}\in\mathds{R}^{n_{1}}$, we define the new weights $\overline{W}^{0}\in \mathds{R}^{(n_1 + A-1)\times n_{0}} where $W_{i,:}$ and $W_{:,i}$ denote the $i$-th row and column of a matrix $W$, respectively. Then,

Figures (1)

  • Figure 1: An illustration of \ref{['eq_introduction_refinable']} is presented. The continuous blue lines represent the refinable activation functions $\sigma_{B^1},\sigma_{B^2},\text{id}$ (from left to right), while the dashed lines show these same functions scaled by 2, shifted and multiplied by a constant. In each graph, the dashed lines sum the continuous blue line.

Theorems & Definitions (22)

  • Definition 1
  • Definition 2
  • Theorem 3
  • proof
  • Remark 4
  • Theorem 5
  • proof
  • Remark 6
  • Theorem 7
  • proof
  • ...and 12 more