Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization
Dawid Płudowski, Antoni Zajko, Anna Kozak, Katarzyna Woźnica
TL;DR
This work tackles the problem of representing heterogeneous tabular datasets for meta-learning, focusing on warm-starting Bayesian Hyperparameter Optimization. It evaluates encoder-based representations, namely Dataset2Vec and a novel liltab-based encoder inspired by few-shot learning, on OpenML, UCI, and metaMIMIC datasets, to determine their utility in transferring hyperparameter configurations. Despite encoders producing meaningful dataset clustering, the study finds no consistent gain over simple baselines (including rank-based warm-start) for HP optimization, suggesting that general-purpose representations may not suffice for all meta-tasks. The findings motivate the development of task-aware representations and more effective heuristics for meta-learning in hyperparameter optimization, with potential impacts on speeding up Bayesian optimization in heterogeneous tabular settings.
Abstract
Effectively representing heterogeneous tabular datasets for meta-learning purposes remains an open problem. Previous approaches rely on predefined meta-features, for example, statistical measures or landmarkers. The emergence of dataset encoders opens new possibilities for the extraction of meta-features because they do not involve any handmade design. Moreover, they are proven to generate dataset representations with desired spatial properties. In this research, we evaluate an encoder-based approach to one of the most established meta-tasks - warm-starting of the Bayesian Hyperparameter Optimization. To broaden our analysis we introduce a new approach for representation learning on tabular data based on [Tomoharu Iwata and Atsutoshi Kumagai. Meta-learning from Tasks with Heterogeneous Attribute Spaces. In Advances in Neural Information Processing Systems, 2020]. The validation on over 100 datasets from UCI and an independent metaMIMIC set of datasets highlights the nuanced challenges in representation learning. We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.
