Table of Contents
Fetching ...

Exploratory Landscape Analysis for Mixed-Variable Problems

Raphael Patrick Prager, Heike Trautmann

TL;DR

This work extends Exploratory Landscape Analysis (ELA) to mixed-variable problems (MVPs) by encoding categorical and hierarchical decision variables into numerical representations, enabling existing ELA feature sets to characterize MVP landscapes. It evaluates two encoding schemes—one-hot (OH) and target encoding (TE)—and applies a preprocessing pipeline to handle hierarchical dependencies, scaling, and categorical variables. An automated algorithm selection study on 702 MVPs from YAHPO Gym shows that TE-based features are faster to compute and yield superior selection performance, closing the SBS–VBS gap by approximately 57.5%. The results support the discriminative power of MVP-specific landscape features and highlight TE as the preferred encoding for future MVP landscape analysis and AAS research, while outlining avenues for improved sampling and hierarchical handling.

Abstract

Exploratory landscape analysis and fitness landscape analysis in general have been pivotal in facilitating problem understanding, algorithm design and endeavors such as automated algorithm selection and configuration. These techniques have largely been limited to search spaces of a single domain. In this work, we provide the means to compute exploratory landscape features for mixed-variable problems where the decision space is a mixture of continuous, binary, integer, and categorical variables. This is achieved by utilizing existing encoding techniques originating from machine learning. We provide a comprehensive juxtaposition of the results based on these different techniques. To further highlight their merit for practical applications, we design and conduct an automated algorithm selection study based on a hyperparameter optimization benchmark suite. We derive a meaningful compartmentalization of these benchmark problems by clustering based on the used landscape features. The identified clusters mimic the behavior the used algorithms exhibit. Meaning, the different clusters have different best performing algorithms. Finally, our trained algorithm selector is able to close the gap between the single best and the virtual best solver by 57.5% over all benchmark problems.

Exploratory Landscape Analysis for Mixed-Variable Problems

TL;DR

This work extends Exploratory Landscape Analysis (ELA) to mixed-variable problems (MVPs) by encoding categorical and hierarchical decision variables into numerical representations, enabling existing ELA feature sets to characterize MVP landscapes. It evaluates two encoding schemes—one-hot (OH) and target encoding (TE)—and applies a preprocessing pipeline to handle hierarchical dependencies, scaling, and categorical variables. An automated algorithm selection study on 702 MVPs from YAHPO Gym shows that TE-based features are faster to compute and yield superior selection performance, closing the SBS–VBS gap by approximately 57.5%. The results support the discriminative power of MVP-specific landscape features and highlight TE as the preferred encoding for future MVP landscape analysis and AAS research, while outlining avenues for improved sampling and hierarchical handling.

Abstract

Exploratory landscape analysis and fitness landscape analysis in general have been pivotal in facilitating problem understanding, algorithm design and endeavors such as automated algorithm selection and configuration. These techniques have largely been limited to search spaces of a single domain. In this work, we provide the means to compute exploratory landscape features for mixed-variable problems where the decision space is a mixture of continuous, binary, integer, and categorical variables. This is achieved by utilizing existing encoding techniques originating from machine learning. We provide a comprehensive juxtaposition of the results based on these different techniques. To further highlight their merit for practical applications, we design and conduct an automated algorithm selection study based on a hyperparameter optimization benchmark suite. We derive a meaningful compartmentalization of these benchmark problems by clustering based on the used landscape features. The identified clusters mimic the behavior the used algorithms exhibit. Meaning, the different clusters have different best performing algorithms. Finally, our trained algorithm selector is able to close the gap between the single best and the virtual best solver by 57.5% over all benchmark problems.
Paper Structure (13 sections, 5 equations, 9 figures, 3 tables)

This paper contains 13 sections, 5 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: High-level overview of the different steps prior to ELA feature computation for MVP.
  • Figure 2: Exemplary hierarchical fitness landscape. The domain of $X_{cat}$ comprises the value a and b. $X_{cont}$ only affects the fitness landscape when $X_{cat} = b$.
  • Figure 3: Frequency of best performing solver per scenario. The text on the x-axis states the problem dimension, the scenario name, the number of categorical variables and the sum of the cardinality of said categorical variables. The actual occurrences are as follows: SM $448$, RS $125$, EA $97$, OP $32$.
  • Figure 4: Performance comparison of each individual solver contrasted to the VBS on a log scale. When a solver has an equivalent performance to the VBS for a given problem instance, the points are located on the line diagonally separating each plot. The horizontal distance to that line quantifies how much worse a specific solver is compared to the VBS. The vertical dashed line represents worst possible ERT, where only a single repetition out of $20$ reaches the target.
  • Figure 5: Computation time of ELA feature sets grouped by encoding and dimensionality of the problem. The x-axis depicts the total cardinality of the decision space whereas the y-axis shows the required time in seconds to calculate a respective feature set.
  • ...and 4 more figures