Table of Contents
Fetching ...

Untrained neural networks can demonstrate memorization-independent abstract reasoning

Tomer Barak, Yonatan Loewenstein

TL;DR

This study studied an ANN model in which the weights of a naive network are optimized during the solution of the problem, using the problem data itself, rather than any prior knowledge, and found that it performs relatively well.

Abstract

The nature of abstract reasoning is a matter of debate. Modern artificial neural network (ANN) models, like large language models, demonstrate impressive success when tested on abstract reasoning problems. However, it has been argued that their success reflects some form of memorization of similar problems (data contamination) rather than a general-purpose abstract reasoning capability. This concern is supported by evidence of brittleness, and the requirement of extensive training. In our study, we explored whether abstract reasoning can be achieved using the toolbox of ANNs, without prior training. Specifically, we studied an ANN model in which the weights of a naive network are optimized during the solution of the problem, using the problem data itself, rather than any prior knowledge. We tested this modeling approach on visual reasoning problems and found that it performs relatively well. Crucially, this success does not rely on memorization of similar problems. We further suggest an explanation of how it works. Finally, as problem solving is performed by changing the ANN weights, we explored the connection between problem solving and the accumulation of knowledge in the ANNs.

Untrained neural networks can demonstrate memorization-independent abstract reasoning

TL;DR

This study studied an ANN model in which the weights of a naive network are optimized during the solution of the problem, using the problem data itself, rather than any prior knowledge, and found that it performs relatively well.

Abstract

The nature of abstract reasoning is a matter of debate. Modern artificial neural network (ANN) models, like large language models, demonstrate impressive success when tested on abstract reasoning problems. However, it has been argued that their success reflects some form of memorization of similar problems (data contamination) rather than a general-purpose abstract reasoning capability. This concern is supported by evidence of brittleness, and the requirement of extensive training. In our study, we explored whether abstract reasoning can be achieved using the toolbox of ANNs, without prior training. Specifically, we studied an ANN model in which the weights of a naive network are optimized during the solution of the problem, using the problem data itself, rather than any prior knowledge. We tested this modeling approach on visual reasoning problems and found that it performs relatively well. Crucially, this success does not rely on memorization of similar problems. We further suggest an explanation of how it works. Finally, as problem solving is performed by changing the ANN weights, we explored the connection between problem solving and the accumulation of knowledge in the ANNs.
Paper Structure (23 sections, 3 equations, 8 figures)

This paper contains 23 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: Visual reasoning problems. The problems are characterized by the Predictive Features (PF) that can be the color (a-b), number (c), or size (d) of the abstract shapes. The values of the predictive features linearly increase along the sequence. The rest of the features (non-predictive) are either constant or random. We refer to the random features as Distractors, and their number determines the problem difficulty. Note: the shapes' type and arrangement are always non-predictive, and can either be constant or distracting. The correct choices in this figure are all $3$.
  • Figure 2: Vanilla model performance. The performance of naive ANNs on the three Predictive Features (PFs): Color (left), Number (center), and Size (right). For each predictive feature, we tested the networks over 16 test conditions where the predictive feature was linearly changing along the sequence, and the non-predictive features were either distractors (marked according to the legend) or constant (not marked). Each test condition included $500$ randomly generated problems. Error bars are 95% Confidence Intervals (CI). The black line and its shade are the average accuracy per difficulty and the corresponding 95% CI. The dashed line denotes the chance level of problems with four choice images ($0.25$).
  • Figure 3: The encoder's FC layers feature correlations. The average absolute correlations of encoders' FC layers with (a) the specific predictive feature of the problems they solved (either Color, Number, or Size), and (b) the other two non-predictive features (from either Color, Number, or Size). Error shades represent the 95% CI, based on the standard error of the means. The calculation of the correlations is detailed in the Methods section.
  • Figure 4: The effect of distractors on accuracy. The figure depicts the relationship between the absolute correlation ratio with the relevant Predictive Feature ($|\rho_{PF}|$) and the Distracting feature $|\rho_{Dis}|$, and its consequential effect on networks' accuracy in problems of that predictive feature with the corresponding distracting feature ($p_{PF,Dis}$). The predictive features were either Color (Dark Gray), Number (Medium Gray), or Size (Light Gray). The distracting features were either Color (Square), Number (Triangle), or Size (Circle). Error bars represent the 95% CI. The black dashed line depicts a linear regression analysis.
  • Figure 5: Problem-solving mechanism. (a) Two example neurons' activity from the convolutional layers' output of a network (before optimization) when presented with the example problem of the inset. The blue neuron has a large covariance with the problem's image order, and the orange neuron has a small covariance with the order. The neurons' L2 gradient norms correlate with their respective image order covariances. (b) In this example network, the L2 gradient norms of the convolutional layers' output neurons are strongly correlated with their image-order covariances ($\rho_{\text{example}}=0.97$). The two example neurons presented in (a) are highlighted. (c) Distribution of the correlations between L2 gradient norms and images' sequence-order across all problems. (d) The absolute correlation of the encoder's FC layers with the sequence order during the optimization process. Error shades represent 95% CI. Neurons' covariance and correlation calculations are explained in Methods.
  • ...and 3 more figures