Self Expanding Convolutional Neural Networks

Blaise Appolinary; Alex Deaconu; Sophia Yang; Qingze; Li

Self Expanding Convolutional Neural Networks

Blaise Appolinary, Alex Deaconu, Sophia Yang, Qingze, Li

TL;DR

The paper tackles fixed architectures and high computational cost in CNNs by introducing Self Expanding CNNs (SECNN) that grow during training using a natural expansion score. The expansion criterion combines the gradient, the inverse Fisher information, and a regularization term, with practical implementation via Empirical Fisher; new layers are added with identity initialization to preserve learned representations. Key contributions include a modular block-based CNN design with per-block capacity, a principled decision process for when and what to expand, and an evaluation on CIFAR-10 showing competitive accuracy with far fewer parameters and without restarting training. This approach offers an eco-friendly, scalable path toward adaptive vision models that can adjust complexity to task demands while reducing computational and energy overhead.

Abstract

In this paper, we present a novel method for dynamically expanding Convolutional Neural Networks (CNNs) during training, aimed at meeting the increasing demand for efficient and sustainable deep learning models. Our approach, drawing from the seminal work on Self-Expanding Neural Networks (SENN), employs a natural expansion score as an expansion criteria to address the common issue of over-parameterization in deep convolutional neural networks, thereby ensuring that the model's complexity is finely tuned to the task's specific needs. A significant benefit of this method is its eco-friendly nature, as it obviates the necessity of training multiple models of different sizes. We employ a strategy where a single model is dynamically expanded, facilitating the extraction of checkpoints at various complexity levels, effectively reducing computational resource use and energy consumption while also expediting the development cycle by offering diverse model complexities from a single training session. We evaluate our method on the CIFAR-10 dataset and our experimental results validate this approach, demonstrating that dynamically adding layers not only maintains but also improves CNN performance, underscoring the effectiveness of our expansion criteria. This approach marks a considerable advancement in developing adaptive, scalable, and environmentally considerate neural network architectures, addressing key challenges in the field of deep learning.

Self Expanding Convolutional Neural Networks

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 4 figures)

This paper contains 13 sections, 3 equations, 4 figures.

Introduction
Methodology
Natural Expansion Score
Adding a New Layer
Architecture Design, Initial Configuration, and Expansion Criteria
When, Where, and What to Expand
Model Expansion Strategy and Implementation
Training
Training Dataset
Training Method
Results
Discussion
Conclusion

Figures (4)

Figure 1: The model architecture. Each block includes a CNN layer, a batch normalization layer and a Leaky ReLU function. The blocks are separated by a pooling layer denoted by p. We include a skip connection from the first block to the output of the final block. We set a maximum capacity of each block to $N$. During training, the network dynamically expands by either adding an identity convolutional layer or upgrading the number of channels in a block, provided it does not exceed the block's capacity. These expansions occur when the network identifies a need for increased complexity to improve performance.
Figure 2: The 10 classes of the CIFAR-10 dataset along with classes [10].
Figure 3: Table displaying the number of parameters required to achieve different validation accuracies on CIFAR-10 over 5 different trials with the same hyperparameters.
Figure 4: Graph showing our best and smallest models compared to other models under 2M parameters [17].

Self Expanding Convolutional Neural Networks

TL;DR

Abstract

Self Expanding Convolutional Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)