Table of Contents
Fetching ...

CatCMA : Stochastic Optimization for Mixed-Category Problems

Ryoki Hamano, Shota Saito, Masahiro Nomura, Kento Uchida, Shinichi Shirakawa

TL;DR

CatCMA tackles mixed-category black-box optimization by learning a joint search distribution over continuous and categorical variables, using information geometric optimization to update a coupled Gaussian–categorical model. It integrates CMA-ES style rank-one updates, step-size adaptation, and ASNG-inspired learning-rate adaptation to balance the two variable types, along with a mathematically derived margin correction to prevent premature convergence. Theoretical margin guarantees and extensive experiments show CatCMA achieves superior robustness and performance compared with state-of-the-art Bayesian MC-BBO methods like CASMOPOLITAN and TPE, especially in higher dimensions. The work provides a practical, scalable framework for jointly optimizing continuous and categorical decisions in complex black-box settings.

Abstract

Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms.

CatCMA : Stochastic Optimization for Mixed-Category Problems

TL;DR

CatCMA tackles mixed-category black-box optimization by learning a joint search distribution over continuous and categorical variables, using information geometric optimization to update a coupled Gaussian–categorical model. It integrates CMA-ES style rank-one updates, step-size adaptation, and ASNG-inspired learning-rate adaptation to balance the two variable types, along with a mathematically derived margin correction to prevent premature convergence. Theoretical margin guarantees and extensive experiments show CatCMA achieves superior robustness and performance compared with state-of-the-art Bayesian MC-BBO methods like CASMOPOLITAN and TPE, especially in higher dimensions. The work provides a practical, scalable framework for jointly optimizing continuous and categorical decisions in complex black-box settings.

Abstract

Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms.
Paper Structure (26 sections, 1 theorem, 34 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 1 theorem, 34 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

proposition 1

Without loss of generality, assuming that categories of the optimal solution are the first categories in all dimensions and the parameter of the categorical distribution satisfies for all $n \in \{1, \ldots, {N_{\mathrm{ca}}}\}$. Let $\lambda_\mathrm{non}$ be the random variable that counts the number of samples containing non-optimal categories among the $\lambda$ samples. If for a constant $\xi

Figures (7)

  • Figure 1: Transition of the best evaluation value, the eigenvalues of $(\sigma^{(t)})^2 \boldsymbol{C}^{(t)}$, and probability of generating the best category $\boldsymbol{q}^{(t)}_{n,1}$ in one typical trial of optimizing SphereCOM with ${N_{\mathrm{co}}} = {N_{\mathrm{ca}}} = 5$ and $K_n = 5$.
  • Figure 2: Transition of the best evaluation value on SphereCOM. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
  • Figure 3: Transition of the best evaluation value on SphereCOM. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
  • Figure 4: Transition of the best evaluation value on RosenbrockCLO. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
  • Figure 5: Transition of the best evaluation value on MCProximity. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
  • ...and 2 more figures

Theorems & Definitions (2)

  • proposition 1
  • proof