Table of Contents
Fetching ...

Meta Learning not to Learn: Robustly Informing Meta-Learning under Nuisance-Varying Families

Louis McConnell

TL;DR

This work addresses out-of-distribution generalization under nuisance-varying task families by balancing positive and negative inductive biases in meta-learning. It introduces Robustly Informed Meta Learning (RIME), a causal framework that uses inverse probability weighting and a mutual-information penalty to isolate nuisance factors $z$ from predictive signals, and employs an Informed Neural Process to model task-conditioned predictions with a learned latent representation $C$. The paper formalizes the objective $R(\\hat{p})=\\sup_{e \\in \\mathcal{E}} - \\mathbb{E}_{p(C)} \\mathbb{E}_{p_e(x | C)} D_{\\mathrm{KL}}(p_e(y | x, C) \\| \\hat{p}(y | x, C))$ and the RIME loss $\\mathcal{L}_{RIME}= \\mathcal{L}_1 + \\beta \\mathcal{L}_2 + \\lambda \\mathcal{L}_3$, where $\\mathcal{L}_3= \\mathbb{I}_{p_{\\perp\\perp}}[(C, r_{\\gamma}(x), y); z]$ controls residual information about $z$. The approach yields state-of-the-art performance on distributionally robust objectives in nuisance-varying settings, demonstrated across no-task-variability and task-variability experiments, with ablations showing the importance of informed critics and accurate mutual-information enforcement. This work provides both a theoretical and empirical framework for robust meta-learning under environment and task heterogeneity, with potential impact on medical imaging and other domains facing site- and environment-specific shifts.

Abstract

In settings where both spurious and causal predictors are available, standard neural networks trained under the objective of empirical risk minimization (ERM) with no additional inductive biases tend to have a dependence on a spurious feature. As a result, it is necessary to integrate additional inductive biases in order to guide the network toward generalizable hypotheses. Often these spurious features are shared across related tasks, such as estimating disease prognoses from image scans coming from different hospitals, making the challenge of generalization more difficult. In these settings, it is important that methods are able to integrate the proper inductive biases to generalize across both nuisance-varying families as well as task families. Motivated by this setting, we present RIME (Robustly Informed Meta lEarning), a new method for meta learning under the presence of both positive and negative inductive biases (what to learn and what not to learn). We first develop a theoretical causal framework showing why existing approaches at knowledge integration can lead to worse performance on distributionally robust objectives. We then show that RIME is able to simultaneously integrate both biases, reaching state of the art performance under distributionally robust objectives in informed meta-learning settings under nuisance-varying families.

Meta Learning not to Learn: Robustly Informing Meta-Learning under Nuisance-Varying Families

TL;DR

This work addresses out-of-distribution generalization under nuisance-varying task families by balancing positive and negative inductive biases in meta-learning. It introduces Robustly Informed Meta Learning (RIME), a causal framework that uses inverse probability weighting and a mutual-information penalty to isolate nuisance factors from predictive signals, and employs an Informed Neural Process to model task-conditioned predictions with a learned latent representation . The paper formalizes the objective and the RIME loss , where controls residual information about . The approach yields state-of-the-art performance on distributionally robust objectives in nuisance-varying settings, demonstrated across no-task-variability and task-variability experiments, with ablations showing the importance of informed critics and accurate mutual-information enforcement. This work provides both a theoretical and empirical framework for robust meta-learning under environment and task heterogeneity, with potential impact on medical imaging and other domains facing site- and environment-specific shifts.

Abstract

In settings where both spurious and causal predictors are available, standard neural networks trained under the objective of empirical risk minimization (ERM) with no additional inductive biases tend to have a dependence on a spurious feature. As a result, it is necessary to integrate additional inductive biases in order to guide the network toward generalizable hypotheses. Often these spurious features are shared across related tasks, such as estimating disease prognoses from image scans coming from different hospitals, making the challenge of generalization more difficult. In these settings, it is important that methods are able to integrate the proper inductive biases to generalize across both nuisance-varying families as well as task families. Motivated by this setting, we present RIME (Robustly Informed Meta lEarning), a new method for meta learning under the presence of both positive and negative inductive biases (what to learn and what not to learn). We first develop a theoretical causal framework showing why existing approaches at knowledge integration can lead to worse performance on distributionally robust objectives. We then show that RIME is able to simultaneously integrate both biases, reaching state of the art performance under distributionally robust objectives in informed meta-learning settings under nuisance-varying families.

Paper Structure

This paper contains 21 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Causal structure of informed meta-learning under nuisance-varying families. Dark gray nodes represent latent variables, white nodes are observed variables, and light gray nodes are available only at training time. Dashed lines represent unstable (environment specific) relationships.
  • Figure 2: Causal structures for the encoding and decoding processes for RIME. In the encoding stage, information from the context set $(r_{\gamma}(x_c), y_c)$ and the prior knowledge representation $K$ is mapped to the context, a latent representation for the task function $f$. During decoding, the underlying causal factor $y_t$ is probabilistically inferred from the context and target $x$ variables $r_{\gamma}(x_t)$.
  • Figure 3: $k$-shot evaluation cross entropy loss from experiment 2 (lower is better; risk evaluated by lowest line). Dotted lines do not use reweighting / distillation, solid lines do. Bottom plot is a zoomed-in plot of different RIME variants and illustrates the effect of knowledge integration / critic setups as well as experiments using the optimal representation as baselines. Best-performing methods with learned representations are in grey ($k$-informed critic, knowledge of $b$) and lime ($C$-informed critic, knowledge of $b$).
  • Figure 4: K-shot evaluation cross entropy loss from experiment 1 (lower is better). Dotted lines do not use reweighting / distillation; dashed lines do.