Table of Contents
Fetching ...

Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection

Manh Hung Nguyen, Lisheng Sun-Hosoya, Isabelle Guyon

TL;DR

The findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and the DDQN baseline, compared to heuristic baselines or a random search.

Abstract

Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI'22 and AutoML-conf'22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.

Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection

TL;DR

The findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and the DDQN baseline, compared to heuristic baselines or a random search.

Abstract

Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI'22 and AutoML-conf'22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.

Paper Structure

This paper contains 12 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Our problem setup. Given a dataset, an agent (meta-learner) $\mathcal{M}$ takes an action to start or continue training an algorithm using a budget, based on an observation containing partially revealed training and validation learning curves. The corresponding test learning curves are kept hidden and used for computing a reward to be returned to the agent. This interaction is repeated until the given total budget is exhausted.
  • Figure 2: MetaLC challenge results. Comparison of top-3 teams and five baselines. Blue bars represent methods that meta-learned from learning curves in meta-training (corresponds to ✓ in the first column of Table \ref{['table:comparison']}). We highlight RandSearch in plain gray, a special baseline with internally averaged performance over several runs. Results for fixed-time learning are included for analysis purposes only and were not officially used in our challenges. The reported results are from the worst run out of three runs with different seeds, and the error bar indicates the standard deviation across meta-test datasets.
  • Figure 3: Ablation study of DDQN baseline. Meta-learning and progression of learning curves improved DDQN's performance in both challenge rounds.
  • Figure 4: Trajectories of baseline DDQN and winning teams' methods on dataset Flora, capturing moments of algorithm transitions. Each marker corresponds to a choice of algorithm $\omega_j$, with the chosen algorithm's family denoted by the marker's color: SGD in blue, AdaBoost in orange, and KNN in purple. Transitions between algorithms are marked with red lines. (a) The DDQN agent began with a strong candidate and consistently selected it. It made a transition only when a performance plateau was reached. (b, c, d) Winning teams' agents exhibited less repetition in their choices and placed a greater emphasis on exploration to achieve better results. The different time ranges on the x-axis were chosen near the beginning of the episode, specifically targeting moments of transition.
  • Figure 5: Algorithms' learning curves with their final rankings. We show some datasets where the baseline BoS beats the baseline DDQN. First round: (a-g); second round: (h-p). Algorithms are color-coded based on their final ranking (i.e., by comparing the last points on their learning curves). In these datasets, the learning curves do not cross each other very often, and the algorithm that ranked first early tends to maintain a very high rank at the end. This illustrates scenarios where BoS beats DDQN. However, in practice, one cannot know in advance if the learning curves of algorithms will cross each other often on a given dataset.