Learning by the F-adjoint

Ahmed Boughammoura

Learning by the F-adjoint

Ahmed Boughammoura

TL;DR

This work developed and investigated a theoretical framework which derived an equilibrium F-adjoint process which yields to some local learning rule for deep feed-forward networks setting, and demonstrated that the proposed approach provide a significant improvements on the standard back-propagation training procedure.

Abstract

A recent paper by Boughammoura (2023) describes the back-propagation algorithm in terms of an alternative formulation called the F-adjoint method. In particular, by the F-adjoint algorithm the computation of the loss gradient, with respect to each weight within the network, is straightforward and can simply be done. In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network. Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process which yields to some local learning rule for deep feed-forward networks setting. Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.

Learning by the F-adjoint

TL;DR

Abstract

Paper Structure (11 sections, 1 theorem, 29 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 11 sections, 1 theorem, 29 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Notation and mathematical background
The F-propagation and F-adjoint
Properties of the F-adjoint
Learning by F-adjoint
Non-local learning rule
Local learning rule
Experiments
Results for MNIST dataset
Results for Fashion-MNIST dataset
Conclusion

Key Result

Lemma 3.1

For a fixed data point $(x, y) \in \mathbb{R}^{N_0}\times\mathbb{R}^{N_L}$, with feature vector $x$ and label $y$ and a fixed loss function $J$. If $X^{L}_{*}=\frac{\partial J}{\partial X^{L}}$ then for any $\ell\in\{1,\cdots, L\}$, we have

Figures (5)

Figure 1: A schematic diagram showing the $\mathrm{F}$ and $\mathrm{F}_*$ processes.
Figure 2: Accuracy for MNIST with nonlocal learning rule.
Figure 3: Accuracy for Fashion-MNIST with nonlocal learning rule.
Figure 4: Accuracy for MNIST with local learning rule.
Figure 5: Accuracy for Fashion-MNIST with local learning rule.

Theorems & Definitions (6)

Definition 3.1: Sequence model
Definition 3.2: An F-propagation
Definition 3.3: The F-adjoint of an F-propagation
Lemma 3.1
proof
Definition 3.4: Local F-learning rule

Learning by the F-adjoint

TL;DR

Abstract

Learning by the F-adjoint

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)