Table of Contents
Fetching ...

Items or Relations -- what do Artificial Neural Networks learn?

Renate Krause, Stefan Reimann

TL;DR

The paper investigates whether an artificial neural network learns training items or the relations among them by analyzing a minimal two-layer auto-associator trained to identically reproduce a small training set $X$ with symmetry group $\Sigma_X$. It shows that linear auto-associators encode the symmetry relations and generalize to symmetry-compatible patterns via a plane attractor, while non-linear auto-associators tend to memorize individual items unless the non-linearity yields an effectively linear regime (e.g., $\tanh$ near zero). Regularization can shift networks toward linear behavior, improving generalization for activations with a linear regime, and the ARC example demonstrates learning and applying symmetry-based rules. Collectively, the results highlight how network weights reflect relational structure in the data, and they suggest strategies to promote relation-based generalization by leveraging symmetry and linearizable regimes.

Abstract

What has an Artificial Neural Network (ANN) learned after being successfully trained to solve a task - the set of training items or the relations between them? This question is difficult to answer for modern applied ANNs because of their enormous size and complexity. Therefore, here we consider a low-dimensional network and a simple task, i.e., the network has to reproduce a set of training items identically. We construct the family of solutions analytically and use standard learning algorithms to obtain numerical solutions. These numerical solutions differ depending on the optimization algorithm and the weight initialization and are shown to be particular members of the family of analytical solutions. In this simple setting, we observe that the general structure of the network weights represents the training set's symmetry group, i.e., the relations between training items. As a consequence, linear networks generalize, i.e., reproduce items that were not part of the training set but are consistent with the symmetry of the training set. In contrast, non-linear networks tend to learn individual training items and show associative memory. At the same time, their ability to generalize is limited. A higher degree of generalization is obtained for networks whose activation function contains a linear regime, such as tanh. Our results suggest ANN's ability to generalize - instead of learning items - could be improved by generating a sufficiently big set of elementary operations to represent relations and strongly depends on the applied non-linearity.

Items or Relations -- what do Artificial Neural Networks learn?

TL;DR

The paper investigates whether an artificial neural network learns training items or the relations among them by analyzing a minimal two-layer auto-associator trained to identically reproduce a small training set with symmetry group . It shows that linear auto-associators encode the symmetry relations and generalize to symmetry-compatible patterns via a plane attractor, while non-linear auto-associators tend to memorize individual items unless the non-linearity yields an effectively linear regime (e.g., near zero). Regularization can shift networks toward linear behavior, improving generalization for activations with a linear regime, and the ARC example demonstrates learning and applying symmetry-based rules. Collectively, the results highlight how network weights reflect relational structure in the data, and they suggest strategies to promote relation-based generalization by leveraging symmetry and linearizable regimes.

Abstract

What has an Artificial Neural Network (ANN) learned after being successfully trained to solve a task - the set of training items or the relations between them? This question is difficult to answer for modern applied ANNs because of their enormous size and complexity. Therefore, here we consider a low-dimensional network and a simple task, i.e., the network has to reproduce a set of training items identically. We construct the family of solutions analytically and use standard learning algorithms to obtain numerical solutions. These numerical solutions differ depending on the optimization algorithm and the weight initialization and are shown to be particular members of the family of analytical solutions. In this simple setting, we observe that the general structure of the network weights represents the training set's symmetry group, i.e., the relations between training items. As a consequence, linear networks generalize, i.e., reproduce items that were not part of the training set but are consistent with the symmetry of the training set. In contrast, non-linear networks tend to learn individual training items and show associative memory. At the same time, their ability to generalize is limited. A higher degree of generalization is obtained for networks whose activation function contains a linear regime, such as tanh. Our results suggest ANN's ability to generalize - instead of learning items - could be improved by generating a sufficiently big set of elementary operations to represent relations and strongly depends on the applied non-linearity.
Paper Structure (16 sections, 13 equations, 2 figures, 1 table)

This paper contains 16 sections, 13 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Network dynamics across activation functions Evolution of 18 input elements $x$ for repeatedly applying W for networks with $linear$ (left), $sigmoid$ (middle), and $tanh$ (right) activation function $\varphi$ (dark blue line shows flow field at n=0, light blue line at n=6; fixed points defined as $x$ at $n=25$). These extended flow fields illustrate how the linear network sets up a plane attractor and, therefore, is able to represent any input element located on this plane. In contrast, the non-linear networks implement fixed points and show a lower generalization ability.
  • Figure 2: Symmetry example$(32)$ is a symmetry operation on the training set, while the rotation $(231)$ is not.