Table of Contents
Fetching ...

On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

Ziwei Zheng, Huizhi Liang, Vaclav Snasel, Vito Latora, Panos Pardalos, Giuseppe Nicosia, Varun Ojha

TL;DR

It is revealed that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights.

Abstract

We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the functionality and efficiency of DNNs. Our findings reveal that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights. In addition, our research shows that the learnable parameters of widely varied deep learning models such as DNN, CNN, and ViT exhibit similar learning characteristics.

On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

TL;DR

It is revealed that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights.

Abstract

We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the functionality and efficiency of DNNs. Our findings reveal that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights. In addition, our research shows that the learnable parameters of widely varied deep learning models such as DNN, CNN, and ViT exhibit similar learning characteristics.
Paper Structure (8 sections, 13 figures, 2 tables)

This paper contains 8 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Convergence of various DNN, CNN, and ViT models over MNIST, FMNIST, and CIFAR-10 datasets. The x-axis and y-axis represent the training epoch and training loss. The color represents the test accuracy of models. Models convergence profiles are categorized into low, mid, and high accuracy model groups. For DNN, the group 'non' (in blue) indicates a group of models that happened to be initialized randomly around zero weight and consequently have not converged over epochs due to fluctuation in gradient around zero and vanishing of gradients to previous layers and were no part of the subsequent analysis. The model characterization is therefore performed for successful networks (high accuracy), marginally successful networks (mid accuracy), and failed networks (low accuracy).
  • Figure 2: DNN weight analysis for optimal and suboptimal network characterization.
  • Figure 3: CNN weight analysis for optimal and suboptimal networks characterization.
  • Figure 4: ViT weight analysis for optimal and suboptimal networks characterization.
  • Figure 5: DNN normalized weight distribution for model characterization.
  • ...and 8 more figures