A mixed-categorical correlation kernel for Gaussian process
P. Saves, Y. Diouane, N. Bartoli, T. Lefebvre, J. Morlier
TL;DR
The paper addresses surrogate modeling for expensive mixed input problems by extending exponential kernels to mixed continuous, integer and categorical inputs, forming a GP surrogate with a product-based correlation k(w^r,w^s) = k^{cont}(x^r,x^s) k^{int}(z^r,z^s) k^{cat}(c^r,c^s). It introduces an Exponential Homoscedastic Hypersphere (EHH) kernel family that unifies Gower-like and continuous-relaxation approaches, proves SPD correlation matrices, and demonstrates that EHH generalizes CR and GD while often outperforming FE in analytic and engineering benchmarks. The approach is implemented in SMT v2.0 and evaluated on cosine, cantilever beam, and aircraft design problems, showing improved likelihood and lower residuals with tractable hyperparameter counts. The work points to future enhancements such as KPLS-based dimension reduction to handle larger dimensionality and more complex mixed inputs, enabling efficient Bayesian optimization in industrial settings.
Abstract
Recently, there has been a growing interest for mixed-categorical meta-models based on Gaussian process (GP) surrogates. In this setting, several existing approaches use different strategies either by using continuous kernels (e.g., continuous relaxation and Gower distance based GP) or by using a direct estimation of the correlation matrix. In this paper, we present a kernel-based approach that extends continuous exponential kernels to handle mixed-categorical variables. The proposed kernel leads to a new GP surrogate that generalizes both the continuous relaxation and the Gower distance based GP models. We demonstrate, on both analytical and engineering problems, that our proposed GP model gives a higher likelihood and a smaller residual error than the other kernel-based state-of-the-art models. Our method is available in the open-source software SMT.
