CatCMA : Stochastic Optimization for Mixed-Category Problems

Ryoki Hamano; Shota Saito; Masahiro Nomura; Kento Uchida; Shinichi Shirakawa

CatCMA : Stochastic Optimization for Mixed-Category Problems

Ryoki Hamano, Shota Saito, Masahiro Nomura, Kento Uchida, Shinichi Shirakawa

TL;DR

CatCMA tackles mixed-category black-box optimization by learning a joint search distribution over continuous and categorical variables, using information geometric optimization to update a coupled Gaussian–categorical model. It integrates CMA-ES style rank-one updates, step-size adaptation, and ASNG-inspired learning-rate adaptation to balance the two variable types, along with a mathematically derived margin correction to prevent premature convergence. Theoretical margin guarantees and extensive experiments show CatCMA achieves superior robustness and performance compared with state-of-the-art Bayesian MC-BBO methods like CASMOPOLITAN and TPE, especially in higher dimensions. The work provides a practical, scalable framework for jointly optimizing continuous and categorical decisions in complex black-box settings.

Abstract

Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms.

CatCMA : Stochastic Optimization for Mixed-Category Problems

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 34 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 1 theorem, 34 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Information Geometric Optimization
Proposed Method
Search Space and Joint Probability Distribution
Definition of Search Space
Joint Probability Distribution for CatCMA
Derivation of IGO Update with Joint Probability Distribution
Introducing Enhancement Mechanisms
Rank-one update and step-size adaptation for the covariance matrix update
Learning rate adaptation for the categorical distribution update
Post-processing for the Multivariate Gaussian Distribution
Margin Correction for the Categorical Distribution
Promising Margin Setting
Behaviours under Inappropriate Margins
Derivation of Promising Margin
...and 11 more sections

Key Result

proposition 1

Without loss of generality, assuming that categories of the optimal solution are the first categories in all dimensions and the parameter of the categorical distribution satisfies for all $n \in \{1, \ldots, {N_{\mathrm{ca}}}\}$. Let $\lambda_\mathrm{non}$ be the random variable that counts the number of samples containing non-optimal categories among the $\lambda$ samples. If for a constant $\xi

Figures (7)

Figure 1: Transition of the best evaluation value, the eigenvalues of $(\sigma^{(t)})^2 \boldsymbol{C}^{(t)}$, and probability of generating the best category $\boldsymbol{q}^{(t)}_{n,1}$ in one typical trial of optimizing SphereCOM with ${N_{\mathrm{co}}} = {N_{\mathrm{ca}}} = 5$ and $K_n = 5$.
Figure 2: Transition of the best evaluation value on SphereCOM. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
Figure 3: Transition of the best evaluation value on SphereCOM. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
Figure 4: Transition of the best evaluation value on RosenbrockCLO. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
Figure 5: Transition of the best evaluation value on MCProximity. The line and shaded area denote the medians and interquartile ranges over 20 independent trials, respectively.
...and 2 more figures

Theorems & Definitions (2)

proposition 1
proof

CatCMA : Stochastic Optimization for Mixed-Category Problems

TL;DR

Abstract

CatCMA : Stochastic Optimization for Mixed-Category Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)