Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

Haoyan Zhai; Qianli Hu; Jiangning Chen

Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

Haoyan Zhai, Qianli Hu, Jiangning Chen

TL;DR

The work tackles mixed-variable optimization by introducing a Hybrid Reinforcement Learning Framework that uses Gradient Bandit-based RL for discrete variable selection and Bayesian Optimization for continuous parameter adjustment, formalized as $\max_{a \in \mathcal{A}, x \in \mathcal{X}} f(a,x)$. It demonstrates how per-discrete-action BO caches guide the reward signal to the Gradient Bandit, enabling efficient exploration and exploitation across the mixed space. Empirical results on synthetic benchmarks and real-world hyperparameter tuning (e.g., XGBoost on Walmart data) show the approach often outperforms vanilla RL, random search, and standalone BO, with improved convergence reliability and reduced solution variance. The framework's modular design suggests broad applicability and potential extensions to other RL and optimization techniques, offering a flexible path toward unified mixed-variable optimization."

Abstract

Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with Bayesian Optimization for continuous variable adjustment. This framework stands out by its strategic integration of RL and continuous optimization techniques, enabling it to dynamically adapt to the problem's mixed-variable nature. By employing RL for exploring discrete decision spaces and Bayesian Optimization to refine continuous parameters, our approach not only demonstrates flexibility but also enhances optimization performance. Our experiments on synthetic functions and real-world machine learning hyperparameter tuning tasks reveal that our method consistently outperforms traditional RL, random search, and standalone Bayesian optimization in terms of effectiveness and efficiency.

Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

TL;DR

. It demonstrates how per-discrete-action BO caches guide the reward signal to the Gradient Bandit, enabling efficient exploration and exploitation across the mixed space. Empirical results on synthetic benchmarks and real-world hyperparameter tuning (e.g., XGBoost on Walmart data) show the approach often outperforms vanilla RL, random search, and standalone BO, with improved convergence reliability and reduced solution variance. The framework's modular design suggests broad applicability and potential extensions to other RL and optimization techniques, offering a flexible path toward unified mixed-variable optimization."

Abstract

Paper Structure (14 sections, 4 equations, 4 figures, 1 algorithm)

This paper contains 14 sections, 4 equations, 4 figures, 1 algorithm.

Introduction
Related Work
Methodology
Problem Definition
Hybrid Reinforcement Learning Framework
Reinforcement Learning for Discrete Variable Optimization
Continuous Optimization Techniques
Experiments
Experimental Setup
Datasets and Scenarios
Synthetic Functions
Real-World Scenario
Results
Conclusion and Future Research

Figures (4)

Figure 1: Overview of our hybrid Reinforcement Learning system. Each RL iteration is augmented by an n-step Bayesian Optimization for continuous variables until the stop criterion is met.
Figure 2: The results for synthetic functions. Each graph is a trajectory of the optimization. The x-axis is the number of iterations, while the y-axis is the absolute gap between the searched maximum and the known global maximum. The red dots are the result of one step of the experiment, and the blue curve is the rolling average of the adjacent 50 points. Specifically in hybrid RL, we need to define the Bayesian Opt steps to perform in each reinforcement learning step $n$: for Shekel function $n=3$, for Composition Function $n=3$, for Sine Permutation Function $n=2$.
Figure 3: The results for machine learning hyperparameter tuning problem. Each curve is the rolling average of the adjacent 50 points. The x-axis is the number of iterations, while the y-axis is the reward. The reward is defined as the objective function value at each step for Random Search, Bayesian Opt, and RL, while it is defined as Equation \ref{['equ:reward']}. In Hybrid RL, the Bayesian Opt step $n=2$.
Figure 4: The statistic result for all the problems repeated 5 times for synthetic functions and 10 times for the machine learning problem with different random seeds. We can see that the hybrid RL gives the lowest std in all problems and achieves the optimal values in most cases.

Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

TL;DR

Abstract

Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)