Hybrid Reinforcement Learning Framework for Mixed-Variable Problems
Haoyan Zhai, Qianli Hu, Jiangning Chen
TL;DR
The work tackles mixed-variable optimization by introducing a Hybrid Reinforcement Learning Framework that uses Gradient Bandit-based RL for discrete variable selection and Bayesian Optimization for continuous parameter adjustment, formalized as $\max_{a \in \mathcal{A}, x \in \mathcal{X}} f(a,x)$. It demonstrates how per-discrete-action BO caches guide the reward signal to the Gradient Bandit, enabling efficient exploration and exploitation across the mixed space. Empirical results on synthetic benchmarks and real-world hyperparameter tuning (e.g., XGBoost on Walmart data) show the approach often outperforms vanilla RL, random search, and standalone BO, with improved convergence reliability and reduced solution variance. The framework's modular design suggests broad applicability and potential extensions to other RL and optimization techniques, offering a flexible path toward unified mixed-variable optimization."
Abstract
Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with Bayesian Optimization for continuous variable adjustment. This framework stands out by its strategic integration of RL and continuous optimization techniques, enabling it to dynamically adapt to the problem's mixed-variable nature. By employing RL for exploring discrete decision spaces and Bayesian Optimization to refine continuous parameters, our approach not only demonstrates flexibility but also enhances optimization performance. Our experiments on synthetic functions and real-world machine learning hyperparameter tuning tasks reveal that our method consistently outperforms traditional RL, random search, and standalone Bayesian optimization in terms of effectiveness and efficiency.
