Table of Contents
Fetching ...

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Yifan Hu, Siqi Zhang, Xin Chen, Niao He

TL;DR

This work addresses CSO, where the objective $F(x)=\mathbb{E}_\xi f_\xi(\mathbb{E}_{\eta|\xi} g_\eta(x,\xi))$ involves nested expectations that preclude unbiased gradient estimation. It introduces biased first-order methods (BSGD and the variance-reduced BSpiderBoost) that leverage inner conditional samples, analyzes their nonasymptotic convergence across strongly convex, convex, weakly convex, and nonconvex regimes, and derives matching lower bounds under standard oracle assumptions. The authors show that outer-function smoothness critically affects inner sampling needs, yielding refined rates such as $\tilde{\mathcal{O}}(\epsilon^{-3})$ (convex with certain conditions) and $\mathcal{O}(\epsilon^{-5})$ for nonconvex smooth CSO with BSpiderBoost, with corresponding MAML implications. They also provide lower-bound results establishing near-optimality and demonstrate practical efficacy through invariant logistic regression and MAML experiments, highlighting improved convergence in challenging meta-learning setups. Overall, the paper delivers a bias-controlled, sample-efficient framework for CSO with theoretical optimality guarantees and actionable guidance for meta-learning applications.

Abstract

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator. For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity. We further conduct numerical experiments on invariant logistic regression and model-agnostic meta-learning to illustrate the performance of BSGD and BSpiderBoost.

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

TL;DR

This work addresses CSO, where the objective involves nested expectations that preclude unbiased gradient estimation. It introduces biased first-order methods (BSGD and the variance-reduced BSpiderBoost) that leverage inner conditional samples, analyzes their nonasymptotic convergence across strongly convex, convex, weakly convex, and nonconvex regimes, and derives matching lower bounds under standard oracle assumptions. The authors show that outer-function smoothness critically affects inner sampling needs, yielding refined rates such as (convex with certain conditions) and for nonconvex smooth CSO with BSpiderBoost, with corresponding MAML implications. They also provide lower-bound results establishing near-optimality and demonstrate practical efficacy through invariant logistic regression and MAML experiments, highlighting improved convergence in challenging meta-learning setups. Overall, the paper delivers a bias-controlled, sample-efficient framework for CSO with theoretical optimality guarantees and actionable guidance for meta-learning applications.

Abstract

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator. For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity. We further conduct numerical experiments on invariant logistic regression and model-agnostic meta-learning to illustrate the performance of BSGD and BSpiderBoost.

Paper Structure

This paper contains 43 sections, 15 theorems, 126 equations, 3 figures, 5 tables, 2 algorithms.

Key Result

Lemma 2.1

Under Assumption ass:general, for a sample $\xi$ and $m$ i.i.d. samples $\{\eta_{j}\}_{j=1}^m$ from the conditional distribution $P(\eta|\xi)$, and any $x\in\mathcal{X}$ that is independent of $\xi$ and $\{\eta_{j}\}_{j=1}^m$, we have

Figures (3)

  • Figure 1: BSGD for invariant Logistic regression (a) $\sigma_2^2/\sigma_1^2=1$, (b) $\sigma_2^2/\sigma_1^2=10$, (c) $\sigma_2^2/\sigma_1^2=100$.
  • Figure 2: (a) Convergences of BSGD under differnt inner batch size. (b) Convergences of BSGD, Adam and BSpiderBoost. (c) Recovered sine-wave signals on an unseen task.
  • Figure 3: FO-MAML may not converge

Theorems & Definitions (17)

  • Lemma 2.1: hu2019sample
  • Lemma 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4: Convergence of BSpiderBoost
  • Remark 3.1
  • Definition 4.1: Biased first-order oracle for CSO
  • Theorem 4.1
  • Corollary B.1
  • ...and 7 more