Table of Contents
Fetching ...

Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE)

Raffaele Marino, Lorenzo Giambagli, Lorenzo Chicchi, Lorenzo Buffoni, Duccio Fanelli

TL;DR

Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.

Abstract

A novel approach for supervised classification is presented which sits at the intersection of machine learning and dynamical systems theory. At variance with other methodologies that employ ordinary differential equations for classification purposes, the untrained model is a priori constructed to accommodate for a set of pre-assigned stationary stable attractors. Classifying amounts to steer the dynamics towards one of the planted attractors, depending on the specificity of the processed item supplied as an input. Asymptotically the system will hence converge on a specific point of the explored multi-dimensional space, flagging the category of the object to be eventually classified. Working in this context, the inherent ability to perform classification, as acquired ex post by the trained model, is ultimately reflected in the shaped basin of attractions associated to each of the target stable attractors. The performance of the proposed method is here challenged against simple toy models crafted for the purpose, as well as by resorting to well established reference standards. Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.

Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE)

TL;DR

Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.

Abstract

A novel approach for supervised classification is presented which sits at the intersection of machine learning and dynamical systems theory. At variance with other methodologies that employ ordinary differential equations for classification purposes, the untrained model is a priori constructed to accommodate for a set of pre-assigned stationary stable attractors. Classifying amounts to steer the dynamics towards one of the planted attractors, depending on the specificity of the processed item supplied as an input. Asymptotically the system will hence converge on a specific point of the explored multi-dimensional space, flagging the category of the object to be eventually classified. Working in this context, the inherent ability to perform classification, as acquired ex post by the trained model, is ultimately reflected in the shaped basin of attractions associated to each of the target stable attractors. The performance of the proposed method is here challenged against simple toy models crafted for the purpose, as well as by resorting to well established reference standards. Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.
Paper Structure (10 sections, 8 equations, 12 figures)

This paper contains 10 sections, 8 equations, 12 figures.

Figures (12)

  • Figure 1: Panel (a): Schematic representation of the dynamical model employed. Each neuron is uniquely associated with a single pixel of the image to be classified. The local dynamics is driven by a double well potential, as pictorially depicted. Panel (b): Schematic representation of the discrete Euler version of the examined continuous dynamical model, implemented as a recurrent neural network.
  • Figure 2: Representation and corresponding targets (asymptotic attractors) of the letters 'A', 'B', 'C', 'D', and 'E' as $7 \times 7$ gray scale. The top row illustrates the representation of each letter, enclosed within rectangular borders. The middle row displays the same letters but with a noise factor of $\epsilon=0.2$ applied, introducing slight distortions. The bottom row presents the target pattern for each letter, with the black line of the target for 'B' starting immediately after the line of the target for 'A', the target for 'C' starting after 'B', 'D' after 'C', and 'E' after 'D', each surrounded by a sea of white pixels. This configuration allows for precise mapping to each corresponding letter, serving as the planted attractors within the matrix $\Phi$, as illustrated in the main body of the paper. Recall in particular that the attractors are shaped by employing the two entry values $\pm a$. Here, $-a$ refers to pixel colored in white, black pixels are associated to $a$.
  • Figure 3: The figure displays the relationship between $m_f$ and the testing noise $\epsilon_{test}$ for various levels of training noise, for 5 classes. Each line represents a different training noise level, ranging from $0.0$ to $1.0$, with a corresponding color code. The horizontal line illustrates a reference point. For this analysis we set $T=40.0$, $\Delta t=0.1$.
  • Figure 4: Panel (a): the temporal evolution of the Mean Squared Error (MSE) is displayed, across a sample of $2000$ images 'D', for various $\epsilon_{test}$ values, at a fixed $\epsilon_{train} = 0.3$. For scenarios where $\epsilon_{test} < \epsilon_{train}$, the MSE tends towards zero as time $t$ approaches infinity. Conversely, when $\epsilon_{test} \geq \epsilon_{train}$, the temporal estimate of the MSE increases with added noise. For this analysis we set $\Delta t=0.1$. Panel (b): the histogram of the MSE associated to different trajectories for $\epsilon_{train}=0.3$ and $\epsilon_{test}=0.6$. A significant fraction of the supplied images are correctly classified. For visual clarity, all non-zero values are designated as 1.
  • Figure 5: In the three-dimensional figure presented, the $x$-axis delineates the variance, $\sigma_{A}$, of the Gaussian perturbation applied to a specific component of the coefficient vector $\vec{c}^{(A)}$ corresponding to the $A$ letter, the $y$-axis displays the index $j$ of the perturbed component from the vector $\vec{c}^{(A)}$, and the $z$-axis represents the sample Mean Squared Error (MSE) value. Notice that the original image can be written as $\vec{x}^{(A)} = \Phi \vec{c}^{(A)}$, where $\Phi$ signifies the transformation matrix. Consequently, by leveraging the inverse of $\Phi$, we can derive the coefficient vector $\vec{c}^{(A)}$. In this representation, we introduce a perturbation exclusively to one component of the vector $\vec{c}^{(A)}$ at a given instance and compute the corresponding MSE. The indices on the $y$-axis are systematically arranged based on the ascending magnitude of the MSE, thereby illustrating the differential impact of perturbations across various components of $\vec{c}^{(A)}$ on the overall error in the reconstructed image.
  • ...and 7 more figures