Table of Contents
Fetching ...

LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection

Youssef Attia El Hili, Albert Thomas, Malik Tiomoko, Abdelhakim Benechehab, Corentin Léger, Corinne Ancourt, Balázs Kégl

TL;DR

The paper tackles the $CASH$ problem by exploring whether large language models can act as in-context meta-learners to recommend model families and hyperparameters from dataset metadata, with and without prior task examples. It formalizes the problem, introduces zero-shot and meta-informed prompting strategies, and validates them on synthetic and real-world tabular tasks, including 22 Kaggle challenges. In synthetic experiments, a 72B LLM demonstrates robust in-context adaptation as context grows; in real tasks, the Meta-Informed prompt achieves the best average performance, approaching expert-driven selections while drastically reducing search costs. The results suggest LLMs can serve as lightweight, general-purpose assistants that complement AutoML pipelines, offering strong task-dependent defaults and cross-task generalization for model selection and hyperparameter optimization.

Abstract

Model and hyperparameter selection are critical but challenging in machine learning, typically requiring expert intuition or expensive automated search. We investigate whether large language models (LLMs) can act as in-context meta-learners for this task. By converting each dataset into interpretable metadata, we prompt an LLM to recommend both model families and hyperparameters. We study two prompting strategies: (1) a zero-shot mode relying solely on pretrained knowledge, and (2) a meta-informed mode augmented with examples of models and their performance on past tasks. Across synthetic and real-world benchmarks, we show that LLMs can exploit dataset metadata to recommend competitive models and hyperparameters without search, and that improvements from meta-informed prompting demonstrate their capacity for in-context meta-learning. These results highlight a promising new role for LLMs as lightweight, general-purpose assistants for model selection and hyperparameter optimization.

LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection

TL;DR

The paper tackles the problem by exploring whether large language models can act as in-context meta-learners to recommend model families and hyperparameters from dataset metadata, with and without prior task examples. It formalizes the problem, introduces zero-shot and meta-informed prompting strategies, and validates them on synthetic and real-world tabular tasks, including 22 Kaggle challenges. In synthetic experiments, a 72B LLM demonstrates robust in-context adaptation as context grows; in real tasks, the Meta-Informed prompt achieves the best average performance, approaching expert-driven selections while drastically reducing search costs. The results suggest LLMs can serve as lightweight, general-purpose assistants that complement AutoML pipelines, offering strong task-dependent defaults and cross-task generalization for model selection and hyperparameter optimization.

Abstract

Model and hyperparameter selection are critical but challenging in machine learning, typically requiring expert intuition or expensive automated search. We investigate whether large language models (LLMs) can act as in-context meta-learners for this task. By converting each dataset into interpretable metadata, we prompt an LLM to recommend both model families and hyperparameters. We study two prompting strategies: (1) a zero-shot mode relying solely on pretrained knowledge, and (2) a meta-informed mode augmented with examples of models and their performance on past tasks. Across synthetic and real-world benchmarks, we show that LLMs can exploit dataset metadata to recommend competitive models and hyperparameters without search, and that improvements from meta-informed prompting demonstrate their capacity for in-context meta-learning. These results highlight a promising new role for LLMs as lightweight, general-purpose assistants for model selection and hyperparameter optimization.

Paper Structure

This paper contains 53 sections, 1 theorem, 22 equations, 6 figures, 5 tables.

Key Result

Theorem 1

Under the assumptions above, for any fixed regularization $\lambda>0$ the distribution of the ridge score $s(\mathbf{x})=\widehat{\mathbf{w}}(\lambda)^\top \mathbf{x}$ conditional on $\mathbf{x}$ belonging to class $k$ converges in distribution to a Gaussian with mean $m_k$ and variance $v_k$ as $d, where $m_k,v_k$ are given by the deterministic formulas above (they are computed from the unique so

Figures (6)

  • Figure 1: Overview of the method. Each task is represented by metadata, and the LLM outputs model and hyperparameter configurations. The dotted arrow indicates the inclusion of prior-task metadata-configuration pairs in the meta-informed setting.
  • Figure 2: Regret vs. number of support tasks $k$, averaged across decoding temperatures. The dashed line represents a static geometric-mean baseline. Shaded regions denote 90% confidence intervals: for model predictions, intervals are computed from the standard error over 5000 trials (1000 per temperature); for the baselines, intervals reflect 1000 trials. The 72B model is the only model to consistently outperform the baselines as $k$ increases, indicating scale-dependent emergence of in-context meta-learning.
  • Figure 3: Comparison of prompting strategies and baselines in terms of $p_{\text{rank}}$. The Context Blends produced by AutoML performance for each challenge are shown as a reference. Error bars indicate 90% confidence intervals of the mean across 8 random seeds per dataset.
  • Figure 4: $p_{\text{rank}}$ over training rounds for Random-Hyperopt, MaxUCB-Hyperopt, Meta-Informed, and Zero-Shot across the six selected datasets. Error bars indicate 90% confidence intervals using standard error across 8 seeds.
  • Figure 5: Regret vs. number of support tasks $k$ for Qwen 2.5 models at five decoding temperatures (T=0.0 to 0.8). Shaded regions denote 90% confidence interval based on standard error across 1000 trials. Only the 72B model shows consistent improvement with increasing k, with minimal effect of temperature across all models.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Remark : Applicability in low dimensions
  • Theorem 1: Asymptotic Gaussianity and deterministic test error
  • proof : Proof sketch