Table of Contents
Fetching ...

Large Language Model Enhanced Particle Swarm Optimization for Hyperparameter Tuning for Deep Learning Models

Saad Hameed, Basheer Qolomany, Samir Brahim Belhaouari, Mohamed Abdallah, Junaid Qadir, Ala Al-Fuqaha

TL;DR

This paper addresses the costly problem of hyperparameter tuning in deep learning by fusing Large Language Models (LLMs) with Particle Swarm Optimization (PSO). The authors propose an LLM-guided PSO where ChatGPT-3.5 and Llama3 supply improved particle positions after a brief PSO run, accelerating convergence while reducing model evaluations. They validate the approach on three tasks: the Rastrigin benchmark, LSTM-based AQI regression, and CNN-based material classification, reporting 20–60% reductions in model calls without sacrificing accuracy. The work demonstrates potential resource savings for DL optimization in resource-constrained settings and outlines future directions including multi-objective optimization and broader LLM testing.

Abstract

Determining the ideal architecture for deep learning models, such as the number of layers and neurons, is a difficult and resource-intensive process that frequently relies on human tuning or computationally costly optimization approaches. While Particle Swarm Optimization (PSO) and Large Language Models (LLMs) have been individually applied in optimization and deep learning, their combined use for enhancing convergence in numerical optimization tasks remains underexplored. Our work addresses this gap by integrating LLMs into PSO to reduce model evaluations and improve convergence for deep learning hyperparameter tuning. The proposed LLM-enhanced PSO method addresses the difficulties of efficiency and convergence by using LLMs (particularly ChatGPT-3.5 and Llama3) to improve PSO performance, allowing for faster achievement of target objectives. Our method speeds up search space exploration by substituting underperforming particle placements with best suggestions offered by LLMs. Comprehensive experiments across three scenarios -- (1) optimizing the Rastrigin function, (2) using Long Short-Term Memory (LSTM) networks for time series regression, and (3) using Convolutional Neural Networks (CNNs) for material classification -- show that the method significantly improves convergence rates and lowers computational costs. Depending on the application, computational complexity is lowered by 20% to 60% compared to traditional PSO methods. Llama3 achieved a 20% to 40% reduction in model calls for regression tasks, whereas ChatGPT-3.5 reduced model calls by 60% for both regression and classification tasks, all while preserving accuracy and error rates. This groundbreaking methodology offers a very efficient and effective solution for optimizing deep learning models, leading to substantial computational performance improvements across a wide range of applications.

Large Language Model Enhanced Particle Swarm Optimization for Hyperparameter Tuning for Deep Learning Models

TL;DR

This paper addresses the costly problem of hyperparameter tuning in deep learning by fusing Large Language Models (LLMs) with Particle Swarm Optimization (PSO). The authors propose an LLM-guided PSO where ChatGPT-3.5 and Llama3 supply improved particle positions after a brief PSO run, accelerating convergence while reducing model evaluations. They validate the approach on three tasks: the Rastrigin benchmark, LSTM-based AQI regression, and CNN-based material classification, reporting 20–60% reductions in model calls without sacrificing accuracy. The work demonstrates potential resource savings for DL optimization in resource-constrained settings and outlines future directions including multi-objective optimization and broader LLM testing.

Abstract

Determining the ideal architecture for deep learning models, such as the number of layers and neurons, is a difficult and resource-intensive process that frequently relies on human tuning or computationally costly optimization approaches. While Particle Swarm Optimization (PSO) and Large Language Models (LLMs) have been individually applied in optimization and deep learning, their combined use for enhancing convergence in numerical optimization tasks remains underexplored. Our work addresses this gap by integrating LLMs into PSO to reduce model evaluations and improve convergence for deep learning hyperparameter tuning. The proposed LLM-enhanced PSO method addresses the difficulties of efficiency and convergence by using LLMs (particularly ChatGPT-3.5 and Llama3) to improve PSO performance, allowing for faster achievement of target objectives. Our method speeds up search space exploration by substituting underperforming particle placements with best suggestions offered by LLMs. Comprehensive experiments across three scenarios -- (1) optimizing the Rastrigin function, (2) using Long Short-Term Memory (LSTM) networks for time series regression, and (3) using Convolutional Neural Networks (CNNs) for material classification -- show that the method significantly improves convergence rates and lowers computational costs. Depending on the application, computational complexity is lowered by 20% to 60% compared to traditional PSO methods. Llama3 achieved a 20% to 40% reduction in model calls for regression tasks, whereas ChatGPT-3.5 reduced model calls by 60% for both regression and classification tasks, all while preserving accuracy and error rates. This groundbreaking methodology offers a very efficient and effective solution for optimizing deep learning models, leading to substantial computational performance improvements across a wide range of applications.

Paper Structure

This paper contains 24 sections, 11 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: Two algorithms are illustrated. First phase (Algorithm 1): Standard PSO optimizes DL model layers and neurons (shaded area). Second phase (Algorithm 2): Key variables—Global Best Layer, Global Best Neuron, and Global Best RMSE—are computed. After initial PSO-based DL model setup, LLM suggests improved particle positions and velocities for faster convergence. 'test_score' (from Algorithm 1) and 'y' (from Algorithm 2) are compared to decide whether to query the LLM again.
  • Figure 2: Flow chart of the experimental setup, including data collection hameed2023deep, preprocessing, fusion, DL model training, and next-day prediction. PSO or LLM-driven PSO is used for hyperparameter optimization once the desired error is reached.
  • Figure 3: Sample images of recyclable and organic materials used for CNN classification. The dataset contains 25,000 images, with 60% used for training and 40% for testing.
  • Figure 4: Box plot showing the number of iterations (note: y-axis does not start at zero) required for Rastrigin function convergence using PSO with varying particle counts and exploration/exploitation settings. Each box represents 10 runs, with mean $(\mu)$ and standard deviation $(\sigma)$ of iteration counts.
  • Figure 5: Box plot of the number of iterations (y-axis does not start at zero) required for Rastrigin function convergence using LLM-driven PSO with varying particle counts. Each box shows the mean $(\mu)$ and standard deviation $(\sigma)$ over 10 runs. (a) ChatGPT-3.5 results. (b) Llama3 results.
  • ...and 4 more figures