Table of Contents
Fetching ...

Transfer Learning of Surrogate Models: Integrating Domain Warping and Affine Transformations

Shuaiqun Pan, Diederick Vermetten, Manuel López-Ibáñez, Thomas Bäck, Hao Wang

TL;DR

The work addresses transferring surrogate models across tasks under nonlinear covariate shifts by jointly learning a nonlinear input warp implemented via per-dimension beta CDFs and an affine transformation, formalized as $f^{\mathrm{T}}(x)=f^{\mathrm{S}}(g(x))$ with $g(x)=W\phi(x)+v$ and $W\in SO(d)$. It provides both differentiable (with Riemannian gradient on $SO(d)$) and non-differentiable (via CMA-ES with a Lie algebra parameterization) pathways to fit the transformation using a small transfer dataset and empirical loss. Empirical results on the Black-Box Optimization Benchmark (BBOB) and an automotive ABS task show data-efficient gains in low-data regimes, particularly in higher dimensions, though the advantage declines as transfer data increases and on highly multimodal landscapes. The approach demonstrates practical potential for data-efficient surrogate reuse, with promising directions including active learning, extension to other regressors, and exploring faster warp alternatives like Kumaraswamy warping.

Abstract

Surrogate models provide efficient alternatives to computationally demanding real world processes but often require large datasets for effective training. A promising solution to this limitation is the transfer of pre-trained surrogate models to new tasks. Previous studies have investigated the transfer of differentiable and non-differentiable surrogate models, typically assuming an affine transformation between the source and target functions. This paper extends previous research by addressing a broader range of transformations, including linear and nonlinear variations. Specifically, we consider the combination of an unknown input warping, such as one modeled by the beta cumulative distribution function, with an unspecified affine transformation. Our approach achieves transfer learning by employing a limited number of data points from the target task to optimize these transformations, minimizing empirical loss on the transfer dataset. We validate the proposed method on the widely used Black-Box Optimization Benchmark (BBOB) testbed and a real-world transfer learning task from the automobile industry. The results underscore the significant advantages of the approach, revealing that the transferred surrogate significantly outperforms both the original surrogate and the one built from scratch using the transfer dataset, particularly in data-scarce scenarios.

Transfer Learning of Surrogate Models: Integrating Domain Warping and Affine Transformations

TL;DR

The work addresses transferring surrogate models across tasks under nonlinear covariate shifts by jointly learning a nonlinear input warp implemented via per-dimension beta CDFs and an affine transformation, formalized as with and . It provides both differentiable (with Riemannian gradient on ) and non-differentiable (via CMA-ES with a Lie algebra parameterization) pathways to fit the transformation using a small transfer dataset and empirical loss. Empirical results on the Black-Box Optimization Benchmark (BBOB) and an automotive ABS task show data-efficient gains in low-data regimes, particularly in higher dimensions, though the advantage declines as transfer data increases and on highly multimodal landscapes. The approach demonstrates practical potential for data-efficient surrogate reuse, with promising directions including active learning, extension to other regressors, and exploring faster warp alternatives like Kumaraswamy warping.

Abstract

Surrogate models provide efficient alternatives to computationally demanding real world processes but often require large datasets for effective training. A promising solution to this limitation is the transfer of pre-trained surrogate models to new tasks. Previous studies have investigated the transfer of differentiable and non-differentiable surrogate models, typically assuming an affine transformation between the source and target functions. This paper extends previous research by addressing a broader range of transformations, including linear and nonlinear variations. Specifically, we consider the combination of an unknown input warping, such as one modeled by the beta cumulative distribution function, with an unspecified affine transformation. Our approach achieves transfer learning by employing a limited number of data points from the target task to optimize these transformations, minimizing empirical loss on the transfer dataset. We validate the proposed method on the widely used Black-Box Optimization Benchmark (BBOB) testbed and a real-world transfer learning task from the automobile industry. The results underscore the significant advantages of the approach, revealing that the transferred surrogate significantly outperforms both the original surrogate and the one built from scratch using the transfer dataset, particularly in data-scarce scenarios.

Paper Structure

This paper contains 17 sections, 6 equations, 23 figures, 3 tables.

Figures (23)

  • Figure 1: For the 2D F7 StepEllipsoid function, we show, from left to right: the contour lines of the source function ($f^{\text{S}}\xspace$), the target ($f^{\text{T}}$), the original GPR ($\hat{f}^{\mkern1mu\text{S}}$) trained to approximate $f^{\text{S}}$, the GPR model trained from scratch with 40 samples from $f^{\text{T}}$, and the transferred GPR using the same samples. $f^{\text{T}}$ is created from $f^{\text{S}}$ by transforming the domain thereof with affine warping. We show the transfer effect with 80 data points in the last two subplots.
  • Figure 2: 2D input warping: The coordinate system is transformed from left to right by beta CDFs with shape parameter $\alpha=1.0558, \beta=1.9339$ for $x$-axis and $\alpha=0.8655,\beta=1.8148$ for $y$-axis. We show the contour lines of a sphere function on the left and its warped version on the right.
  • Figure 3: On 2D BBOB functions, we compare transferred GPR models with those trained from scratch on the transfer dataset. Each cell displays the percentage difference in average SMAPE (%) for a combination of BBOB functions, sample size, and beta CDF shape. Positive values (shown in red) indicate superior performance of the transferred model.
  • Figure 4: The SMAPE values ($y$-axis) for the original GPR, transferred GPR, and GPR trained solely on the transfer dataset are plotted against the transfer dataset sizes ($x$-axis: 5, 10, 15, 20, 30, 40, 80) for 2D BBOB functions. The analysis combines a beta CDF warping function (approximating an exponential transformation) with an affine transformation.
  • Figure 5: The ablation study focuses on the beta CDF warping function, with rotation and translation disabled, approximating an exponential transformation. We compare our results with reproduced code from PanVerLopBac2024transfer using box plots for 2D BBOB functions with a 20-sample transfer dataset. The plots show SMAPE values ($y$-axis) for the transferred GPR and a model trained solely on the transfer dataset across different transfer learning settings ($x$-axis).
  • ...and 18 more figures

Theorems & Definitions (1)

  • Remark