Table of Contents
Fetching ...

HyperQ-Opt: Q-learning for Hyperparameter Optimization

Md. Tarek Hasan

TL;DR

The paper investigates framing hyperparameter optimization (HPO) as a sequential decision problem and applies Q-learning to learn hyperparameter policies, challenging traditional methods like grid search, random search, and SMBO. It surveys two key studies—H.S. Jomaa et al.'s Hyp-RL and Qi et al.—that model HPO as Markov Decision Processes and use Q-learning to improve search efficiency under limited trials. Both works employ discrete hyperparameter grids and an epsilon-greedy policy, yielding near-optimal configurations or transferable policies, yet they reveal gaps in space design and policy justification. The work argues for policy-based optimization and highlights promising directions, including continuous search spaces and improved policy learning, to enhance scalability and efficiency in HPO.

Abstract

Hyperparameter optimization (HPO) is critical for enhancing the performance of machine learning models, yet it often involves a computationally intensive search across a large parameter space. Traditional approaches such as Grid Search and Random Search suffer from inefficiency and limited scalability, while surrogate models like Sequential Model-based Bayesian Optimization (SMBO) rely heavily on heuristic predictions that can lead to suboptimal results. This paper presents a novel perspective on HPO by formulating it as a sequential decision-making problem and leveraging Q-learning, a reinforcement learning technique, to optimize hyperparameters. The study explores the works of H.S. Jomaa et al. and Qi et al., which model HPO as a Markov Decision Process (MDP) and utilize Q-learning to iteratively refine hyperparameter settings. The approaches are evaluated for their ability to find optimal or near-optimal configurations within a limited number of trials, demonstrating the potential of reinforcement learning to outperform conventional methods. Additionally, this paper identifies research gaps in existing formulations, including the limitations of discrete search spaces and reliance on heuristic policies, and suggests avenues for future exploration. By shifting the paradigm toward policy-based optimization, this work contributes to advancing HPO methods for scalable and efficient machine learning applications.

HyperQ-Opt: Q-learning for Hyperparameter Optimization

TL;DR

The paper investigates framing hyperparameter optimization (HPO) as a sequential decision problem and applies Q-learning to learn hyperparameter policies, challenging traditional methods like grid search, random search, and SMBO. It surveys two key studies—H.S. Jomaa et al.'s Hyp-RL and Qi et al.—that model HPO as Markov Decision Processes and use Q-learning to improve search efficiency under limited trials. Both works employ discrete hyperparameter grids and an epsilon-greedy policy, yielding near-optimal configurations or transferable policies, yet they reveal gaps in space design and policy justification. The work argues for policy-based optimization and highlights promising directions, including continuous search spaces and improved policy learning, to enhance scalability and efficiency in HPO.

Abstract

Hyperparameter optimization (HPO) is critical for enhancing the performance of machine learning models, yet it often involves a computationally intensive search across a large parameter space. Traditional approaches such as Grid Search and Random Search suffer from inefficiency and limited scalability, while surrogate models like Sequential Model-based Bayesian Optimization (SMBO) rely heavily on heuristic predictions that can lead to suboptimal results. This paper presents a novel perspective on HPO by formulating it as a sequential decision-making problem and leveraging Q-learning, a reinforcement learning technique, to optimize hyperparameters. The study explores the works of H.S. Jomaa et al. and Qi et al., which model HPO as a Markov Decision Process (MDP) and utilize Q-learning to iteratively refine hyperparameter settings. The approaches are evaluated for their ability to find optimal or near-optimal configurations within a limited number of trials, demonstrating the potential of reinforcement learning to outperform conventional methods. Additionally, this paper identifies research gaps in existing formulations, including the limitations of discrete search spaces and reliance on heuristic policies, and suggests avenues for future exploration. By shifting the paradigm toward policy-based optimization, this work contributes to advancing HPO methods for scalable and efficient machine learning applications.

Paper Structure

This paper contains 5 sections, 1 equation, 2 tables, 3 algorithms.