Table of Contents
Fetching ...

Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman

TL;DR

The paper tackles the challenge of optimizing extremely large DNNs with gradient-based methods that risk local minima and high memory demands. It introduces Histogram-based Blocking Differential Evolution (HBDE), a two-stage approach combining gradient-based pre-training with gradient-free fine-tuning on a histogram-blocked parameter space, where the reduced dimension is $D' = D/BS$. By reconstructing the full parameter vector via an Unblocker, HBDE achieves substantial dimension reduction (e.g., from ~11.2M parameters to ~3.4K blocks in ResNet-18) and shows competitive or superior $F1$-scores on CIFAR-10 and CIFAR-100 compared to Adam and classic DE. This demonstrates the practical viability of gradient-free optimization for very large networks and highlights potential for multi-objective and memory-efficient training and deployment in real-world tasks.

Abstract

Deep learning is mainly based on utilizing gradient-based optimization for training Deep Neural Network (DNN) models. Although robust and widely used, gradient-based optimization algorithms are prone to getting stuck in local minima. In this modern deep learning era, the state-of-the-art DNN models have millions and billions of parameters, including weights and biases, making them huge-scale optimization problems in terms of search space. Tuning a huge number of parameters is a challenging task that causes vanishing/exploding gradients and overfitting; likewise, utilized loss functions do not exactly represent our targeted performance metrics. A practical solution to exploring large and complex solution space is meta-heuristic algorithms. Since DNNs exceed thousands and millions of parameters, even robust meta-heuristic algorithms, such as Differential Evolution, struggle to efficiently explore and converge in such huge-dimensional search spaces, leading to very slow convergence and high memory demand. To tackle the mentioned curse of dimensionality, the concept of blocking was recently proposed as a technique that reduces the search space dimensions by grouping them into blocks. In this study, we aim to introduce Histogram-based Blocking Differential Evolution (HBDE), a novel approach that hybridizes gradient-based and gradient-free algorithms to optimize parameters. Experimental results demonstrated that the HBDE could reduce the parameters in the ResNet-18 model from 11M to 3K during the training/optimizing phase by metaheuristics, namely, the proposed HBDE, which outperforms baseline gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and CIFAR-100 datasets showcasing its effectiveness with reduced computational demands for the very first time.

Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

TL;DR

The paper tackles the challenge of optimizing extremely large DNNs with gradient-based methods that risk local minima and high memory demands. It introduces Histogram-based Blocking Differential Evolution (HBDE), a two-stage approach combining gradient-based pre-training with gradient-free fine-tuning on a histogram-blocked parameter space, where the reduced dimension is . By reconstructing the full parameter vector via an Unblocker, HBDE achieves substantial dimension reduction (e.g., from ~11.2M parameters to ~3.4K blocks in ResNet-18) and shows competitive or superior -scores on CIFAR-10 and CIFAR-100 compared to Adam and classic DE. This demonstrates the practical viability of gradient-free optimization for very large networks and highlights potential for multi-objective and memory-efficient training and deployment in real-world tasks.

Abstract

Deep learning is mainly based on utilizing gradient-based optimization for training Deep Neural Network (DNN) models. Although robust and widely used, gradient-based optimization algorithms are prone to getting stuck in local minima. In this modern deep learning era, the state-of-the-art DNN models have millions and billions of parameters, including weights and biases, making them huge-scale optimization problems in terms of search space. Tuning a huge number of parameters is a challenging task that causes vanishing/exploding gradients and overfitting; likewise, utilized loss functions do not exactly represent our targeted performance metrics. A practical solution to exploring large and complex solution space is meta-heuristic algorithms. Since DNNs exceed thousands and millions of parameters, even robust meta-heuristic algorithms, such as Differential Evolution, struggle to efficiently explore and converge in such huge-dimensional search spaces, leading to very slow convergence and high memory demand. To tackle the mentioned curse of dimensionality, the concept of blocking was recently proposed as a technique that reduces the search space dimensions by grouping them into blocks. In this study, we aim to introduce Histogram-based Blocking Differential Evolution (HBDE), a novel approach that hybridizes gradient-based and gradient-free algorithms to optimize parameters. Experimental results demonstrated that the HBDE could reduce the parameters in the ResNet-18 model from 11M to 3K during the training/optimizing phase by metaheuristics, namely, the proposed HBDE, which outperforms baseline gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and CIFAR-100 datasets showcasing its effectiveness with reduced computational demands for the very first time.
Paper Structure (11 sections, 7 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 7 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: ResNet-18 parameters (11.2M) blocked by Histogram with $N_{max}=10,000$.
  • Figure 2: ResNet-18 parameters (11.2M) after removing empty blocks. The final number of blocks dropped from $N_{max}=10,000$ to $N_{opt}=3,430$.
  • Figure 3: Sample images from CIFAR-10 dataset.
  • Figure 4: Gradient-based pre-training.
  • Figure 5: Convergence plots for meta-heuristics on optimizing model evaluated on circular super-batches of data.