Table of Contents
Fetching ...

ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning

Pritam Kadasi, Abhishek Upperwal, Mayank SIngh

TL;DR

ADAPT addresses the challenge of allocating a fixed token budget across multiple instruction-tuning tasks. It introduces a differentiable bilevel meta-learning approach that learns a continuous task mixture under a budget, guided by a smooth worst-case validation objective and entropy regularization. The method demonstrates competitive downstream performance compared with strong static baselines while delivering substantial improvements in training efficiency and task-budget allocation toward harder, more informative tasks on small open LLMs. These findings suggest practical benefits for budget-constrained instruction tuning and motivate scaling and exploring alternative meta-objectives in future work.

Abstract

We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution over tasks and updates it via meta-gradients of a smooth worst-case validation objective, inducing an adaptive curriculum that allocates more tokens to useful tasks while avoiding collapse. We instantiate ADAPT on three $\sim$1B-parameter open-weight LLMs (Gemma-3-1B, LLaMA-3.2-1B, Qwen-0.6B), training on 20 Natural Instructions task types under budgets of $1\%$, $5\%$, and $10\%$ of the available supervised tokens, and compare against strong supervised fine-tuning baselines with uniform and size-proportional mixing. We conduct evaluations on 11 out-of-domain benchmarks spanning reasoning, reading comprehension, code generation, and instruction following, we find that ADAPT matches or slightly improves average downstream performance relative to the best static mixture, while using fewer effective training tokens and reallocating budget toward harder, benchmark-aligned tasks.

ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning

TL;DR

ADAPT addresses the challenge of allocating a fixed token budget across multiple instruction-tuning tasks. It introduces a differentiable bilevel meta-learning approach that learns a continuous task mixture under a budget, guided by a smooth worst-case validation objective and entropy regularization. The method demonstrates competitive downstream performance compared with strong static baselines while delivering substantial improvements in training efficiency and task-budget allocation toward harder, more informative tasks on small open LLMs. These findings suggest practical benefits for budget-constrained instruction tuning and motivate scaling and exploring alternative meta-objectives in future work.

Abstract

We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution over tasks and updates it via meta-gradients of a smooth worst-case validation objective, inducing an adaptive curriculum that allocates more tokens to useful tasks while avoiding collapse. We instantiate ADAPT on three 1B-parameter open-weight LLMs (Gemma-3-1B, LLaMA-3.2-1B, Qwen-0.6B), training on 20 Natural Instructions task types under budgets of , , and of the available supervised tokens, and compare against strong supervised fine-tuning baselines with uniform and size-proportional mixing. We conduct evaluations on 11 out-of-domain benchmarks spanning reasoning, reading comprehension, code generation, and instruction following, we find that ADAPT matches or slightly improves average downstream performance relative to the best static mixture, while using fewer effective training tokens and reallocating budget toward harder, benchmark-aligned tasks.

Paper Structure

This paper contains 41 sections, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Win rates of AFT against static SFT baselines. For each base model and budget $b \in \{1,5,10\}\%$, we plot the fraction of the 11 benchmarks on which AFT matches or exceeds SFT-U and SFT-P at the same budget. AFT wins or ties on roughly half the tasks at $1\%$ and on a clear majority at $5$–$10\%$.
  • Figure 2: Budget efficiency across models and methods. For each base model and budget pair $(1\%\!,5\%)$, $(1\%\!,10\%)$, and $(5\%\!,10\%)$, bars show the percentage of tasks on which the smaller-budget run matches or exceeds the larger-budget run, highlighting diminishing returns beyond about $5\%$ of the training tokens.
  • Figure 3: Average evaluation score as a function of budget for AFT, SFT-U, and SFT-P on each base model. Curves are shown for budgets $\{0,1,5,10\}\%$ of the supervised tokens (with $0\%$ corresponding to zero-shot), illustrating steep gains from $0\%\!\to\!5\%$ and saturation between $5\%$ and $10\%$ while AFT closely tracks the best static baseline.
  • Figure 4: Validation loss as a function of cumulative training tokens for AFT, SFT-U, and SFT-P across base models and budgets. AFT consistently descends faster and reaches low-loss regimes with fewer tokens than the static SFT baselines, indicating better convergence efficiency under the same token budget.
  • Figure 5: Tokens (as a percentage of the budget $B$) required by AFT to match the best validation loss of the strongest SFT baseline (SFT-U or SFT-P) across budgets $B \in \{1,5,10\}\%$ for each base model. Lower values indicate that AFT reaches supervised quality with fewer tokens.
  • ...and 5 more figures