Table of Contents
Fetching ...

A Linear Programming Enhanced Genetic Algorithm for Hyperparameter Tuning in Machine Learning

Ankur Sinha, Paritosh Pankaj

TL;DR

This work treats hyperparameter tuning as a bilevel optimization problem and introduces a linear-programming–enhanced fine-tuning strategy to accelerate local optimization of continuous hyperparameters within a micro genetic algorithm. By solving a Hessian-informed linear program, the method yields a steepest-descent direction that preserves lower-level optimality, enabling fast, targeted improvement of model performance. Empirical results on MNIST and CIFAR-10 show consistent gains across grid search, random search, and micro-GA, with notable improvements in validation and test accuracy when hyper local search is employed. The approach is general and can be integrated with existing hyperparameter search techniques, though its reliance on Hessian computations suggests future work to develop scalable, approximate methods. Overall, the LP-enhanced fine-tuning offers a promising, broadly applicable mechanism for improving hyperparameter optimization in machine learning.

Abstract

In this paper, we formulate the hyperparameter tuning problem in machine learning as a bilevel program. The bilevel program is solved using a micro genetic algorithm that is enhanced with a linear program. While the genetic algorithm searches over discrete hyperparameters, the linear program enhancement allows hyper local search over continuous hyperparameters. The major contribution in this paper is the formulation of a linear program that supports fast search over continuous hyperparameters, and can be integrated with any hyperparameter search technique. It can also be applied directly on any trained machine learning or deep learning model for the purpose of fine-tuning. We test the performance of the proposed approach on two datasets, MNIST and CIFAR-10. Our results clearly demonstrate that using the linear program enhancement offers significant promise when incorporated with any population-based approach for hyperparameter tuning.

A Linear Programming Enhanced Genetic Algorithm for Hyperparameter Tuning in Machine Learning

TL;DR

This work treats hyperparameter tuning as a bilevel optimization problem and introduces a linear-programming–enhanced fine-tuning strategy to accelerate local optimization of continuous hyperparameters within a micro genetic algorithm. By solving a Hessian-informed linear program, the method yields a steepest-descent direction that preserves lower-level optimality, enabling fast, targeted improvement of model performance. Empirical results on MNIST and CIFAR-10 show consistent gains across grid search, random search, and micro-GA, with notable improvements in validation and test accuracy when hyper local search is employed. The approach is general and can be integrated with existing hyperparameter search techniques, though its reliance on Hessian computations suggests future work to develop scalable, approximate methods. Overall, the LP-enhanced fine-tuning offers a promising, broadly applicable mechanism for improving hyperparameter optimization in machine learning.

Abstract

In this paper, we formulate the hyperparameter tuning problem in machine learning as a bilevel program. The bilevel program is solved using a micro genetic algorithm that is enhanced with a linear program. While the genetic algorithm searches over discrete hyperparameters, the linear program enhancement allows hyper local search over continuous hyperparameters. The major contribution in this paper is the formulation of a linear program that supports fast search over continuous hyperparameters, and can be integrated with any hyperparameter search technique. It can also be applied directly on any trained machine learning or deep learning model for the purpose of fine-tuning. We test the performance of the proposed approach on two datasets, MNIST and CIFAR-10. Our results clearly demonstrate that using the linear program enhancement offers significant promise when incorporated with any population-based approach for hyperparameter tuning.
Paper Structure (7 sections, 1 theorem, 16 equations, 9 figures, 3 tables)

This paper contains 7 sections, 1 theorem, 16 equations, 9 figures, 3 tables.

Key Result

Theorem 1

At a given point $(\lambda_{c}^{\circ},w^{\circ})$, such that, $w^{\circ} \in \mathop{\rm argmin}\limits_w \{f(\lambda_{c}^{\circ},w; S^{T})$ the steepest descent direction for mod:continuousHyp can be obtained by solving the following problem:

Figures (9)

  • Figure 1: $(\lambda_c,w)$ space with the descent direction $(d_{\lambda_{c}}^{\ast},d_{w}^{\ast})$ at $(\lambda_{c}^{\circ},w^{\circ})$. The training and validation loss are also shown along the descent direction.
  • Figure 2: Training and validation losses while moving along the steepest descent direction for MNIST (1HP).
  • Figure 3: Training and validation losses while moving along the steepest descent direction for MNIST (2HP).
  • Figure 4: Training and validation losses while moving along the steepest descent direction for MNIST (6HP).
  • Figure 5: Validation and test accuracy with increase in number of regularization hyperparameters. The base model with no regularization hyperparameters had the lowest test accuracy of $0.6776$.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof