Table of Contents
Fetching ...

Minimum number of neurons in fully connected layers of a given neural network (the first approximation)

Oleg I. Berngardt

TL;DR

The paper tackles the problem of determining the minimal width of fully connected layers for a neural network solving a given task, without performing multiple full trainings for different widths. It introduces a method that starts from a wide, cross-validated network and uses a truncated SVD autoencoder inserted after each studied layer to probe the layer's latent dimension, with the minimum width $M$ tied to the rank of the layer's output matrix $Y^{(n)}$ and validated via statistical equivalence. Experiments on MNIST variants and other datasets show that the identified minima can closely match original performance while being substantially smaller than universal bounds, though the approach is stochastic and does not guarantee trainability. Overall, the work provides a first approximation for per-layer neuron count that behaves as an intrinsic property of the solution, offering a potential pathway for lightweight architecture optimization and compression.

Abstract

This paper presents an algorithm for searching for the minimum number of neurons in fully connected layers of an arbitrary network solving given problem, which does not require multiple training of the network with different number of neurons. The algorithm is based at training the initial wide network using the cross-validation method over at least two folds. Then by using truncated singular value decomposition autoencoder inserted after the studied layer of trained network we search the minimum number of neurons in inference only mode of the network. It is shown that the minimum number of neurons in a fully connected layer could be interpreted not as network hyperparameter associated with the other hyperparameters of the network, but as internal (latent) property of the solution, determined by the network architecture, the training dataset, layer position, and the quality metric used. So the minimum number of neurons can be estimated for each hidden fully connected layer independently. The proposed algorithm is the first approximation for estimating the minimum number of neurons in the layer, since, on the one hand, the algorithm does not guarantee that a neural network with the found number of neurons can be trained to the required quality, and on the other hand, it searches for the minimum number of neurons in a limited class of possible solutions. The solution was tested on several datasets in classification and regression problems.

Minimum number of neurons in fully connected layers of a given neural network (the first approximation)

TL;DR

The paper tackles the problem of determining the minimal width of fully connected layers for a neural network solving a given task, without performing multiple full trainings for different widths. It introduces a method that starts from a wide, cross-validated network and uses a truncated SVD autoencoder inserted after each studied layer to probe the layer's latent dimension, with the minimum width tied to the rank of the layer's output matrix and validated via statistical equivalence. Experiments on MNIST variants and other datasets show that the identified minima can closely match original performance while being substantially smaller than universal bounds, though the approach is stochastic and does not guarantee trainability. Overall, the work provides a first approximation for per-layer neuron count that behaves as an intrinsic property of the solution, offering a potential pathway for lightweight architecture optimization and compression.

Abstract

This paper presents an algorithm for searching for the minimum number of neurons in fully connected layers of an arbitrary network solving given problem, which does not require multiple training of the network with different number of neurons. The algorithm is based at training the initial wide network using the cross-validation method over at least two folds. Then by using truncated singular value decomposition autoencoder inserted after the studied layer of trained network we search the minimum number of neurons in inference only mode of the network. It is shown that the minimum number of neurons in a fully connected layer could be interpreted not as network hyperparameter associated with the other hyperparameters of the network, but as internal (latent) property of the solution, determined by the network architecture, the training dataset, layer position, and the quality metric used. So the minimum number of neurons can be estimated for each hidden fully connected layer independently. The proposed algorithm is the first approximation for estimating the minimum number of neurons in the layer, since, on the one hand, the algorithm does not guarantee that a neural network with the found number of neurons can be trained to the required quality, and on the other hand, it searches for the minimum number of neurons in a limited class of possible solutions. The solution was tested on several datasets in classification and regression problems.
Paper Structure (6 sections, 21 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 6 sections, 21 equations, 2 figures, 1 table, 2 algorithms.

Figures (2)

  • Figure 1: Architecture for searching for the minimum number of neurons in a layer $n$. A) - architecture of network $S$; B) - architecture of network $D'$; C) - architecture for study MNIST problem, p is a network width multiplier.
  • Figure 2: Performance and stability of the algorithm. A) predicted minimum number of neurons for different number of neurons in the $S$ original layer (mean over $C$ folds combinations); B) predicted minimum number of neurons for different number of neurons in the $S$ original layer (over $C$ folds combinations); C) predicted minimum number of neurons for different number of $C$ folds (for the number of neurons in original layer is 128); D) predicted minimum number of neurons for different dataset separation and order variants (for the number of neurons in original layer is 128); E) accuracy at test dataset for original network $S$ (red circles) and found equivalent network $D$ with minimal number of neurons in the layers (40 and 10 for the first and second layer correspondingly): green diamonds - for simple training with constant learning rate $10^{-3}$and early stopping with patience 3; blue crosses - for decreasing learning rate $10^{-3}-10^{-6}$ and early stopping with patience 10