Table of Contents
Fetching ...

Learning to Actively Learn: A Robust Approach

Jifan Zhang, Lalit Jain, Kevin Jamieson

TL;DR

This work tackles designing robust adaptive data-collection algorithms for scenarios with very small budgets, where traditional concentration-based guarantees are weak. It introduces a framework that learns a single adaptive policy through adversarial training over difficulty classes defined by an instance-dependent complexity $\mathcal{C}(\theta)$, instantiated as $\widetilde{\rho}(\theta)$ for combinatorial bandits. The authors propose MAPO, a differentiable, minimax policy optimization method that uses nested difficulty sets $\Theta^{(r_k)}$ and a softmax reparameterization to train a policy $\pi^{\psi}$ that minimizes the worst-case suboptimality gap. Experiments on synthetic threshold tasks and real data (20 Questions, Jester) show MAPO achieves robust, instance-adaptive performance in the low-budget regime, often matching or beating strong baselines without requiring test-time priors.

Abstract

This work proposes a procedure for designing algorithms for specific adaptive data collection tasks like active learning and pure-exploration multi-armed bandits. Unlike the design of traditional adaptive algorithms that rely on concentration of measure and careful analysis to justify the correctness and sample complexity of the procedure, our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds. In particular, a single adaptive learning algorithm is learned that competes with the best adaptive algorithm learned for each equivalence class. Our procedure takes as input just the available queries, set of hypotheses, loss function, and total query budget. This is in contrast to existing meta-learning work that learns an adaptive algorithm relative to an explicit, user-defined subset or prior distribution over problems which can be challenging to define and be mismatched to the instance encountered at test time. This work is particularly focused on the regime when the total query budget is very small, such as a few dozen, which is much smaller than those budgets typically considered by theoretically derived algorithms. We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data including a noisy 20 Questions game and a joke recommendation task.

Learning to Actively Learn: A Robust Approach

TL;DR

This work tackles designing robust adaptive data-collection algorithms for scenarios with very small budgets, where traditional concentration-based guarantees are weak. It introduces a framework that learns a single adaptive policy through adversarial training over difficulty classes defined by an instance-dependent complexity , instantiated as for combinatorial bandits. The authors propose MAPO, a differentiable, minimax policy optimization method that uses nested difficulty sets and a softmax reparameterization to train a policy that minimizes the worst-case suboptimality gap. Experiments on synthetic threshold tasks and real data (20 Questions, Jester) show MAPO achieves robust, instance-adaptive performance in the low-budget regime, often matching or beating strong baselines without requiring test-time priors.

Abstract

This work proposes a procedure for designing algorithms for specific adaptive data collection tasks like active learning and pure-exploration multi-armed bandits. Unlike the design of traditional adaptive algorithms that rely on concentration of measure and careful analysis to justify the correctness and sample complexity of the procedure, our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds. In particular, a single adaptive learning algorithm is learned that competes with the best adaptive algorithm learned for each equivalence class. Our procedure takes as input just the available queries, set of hypotheses, loss function, and total query budget. This is in contrast to existing meta-learning work that learns an adaptive algorithm relative to an explicit, user-defined subset or prior distribution over problems which can be challenging to define and be mismatched to the instance encountered at test time. This work is particularly focused on the regime when the total query budget is very small, such as a few dozen, which is much smaller than those budgets typically considered by theoretically derived algorithms. We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data including a noisy 20 Questions game and a joke recommendation task.

Paper Structure

This paper contains 25 sections, 15 equations, 13 figures, 4 tables, 3 algorithms.

Figures (13)

  • Figure 1: Performance curves for various policies.
  • Figure 2: Learned policies, lower is better
  • Figure 3: Sub-optimality of individual policies, lower is better
  • Figure 4: Max $\{\theta:\widetilde{\rho}(\theta)\leq r \}$, lower is better
  • Figure 5: Average $\mathbb{E}_{\theta \sim \mathcal{P}_h}[ \cdot ]$, lower is better
  • ...and 8 more figures