Table of Contents
Fetching ...

Hacking Task Confounder in Meta-Learning

Jingyao Wang, Yi Ren, Zeen Song, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

TL;DR

The paper identifies Task Confounders as cross-task spurious correlations that degrade meta-learning generalization and analyzes them with Structural Causal Models. It proposes MetaCRL, a plug-and-play causal representation learner with a Disentangling Module and a Causal Module that enforce decoupled generating factors and invariant causality through bi-level optimization. Across sinusoid regression, image classification, drug activity prediction, and pose estimation, MetaCRL consistently yields state-of-the-art results and reduces negative transfer, validating the importance of causal representation in meta-learning. The approach offers a practical pathway to more robust meta-learners by explicitly decoupling task-specific factors and enforcing invariance to distribution shifts during training.

Abstract

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as "Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.

Hacking Task Confounder in Meta-Learning

TL;DR

The paper identifies Task Confounders as cross-task spurious correlations that degrade meta-learning generalization and analyzes them with Structural Causal Models. It proposes MetaCRL, a plug-and-play causal representation learner with a Disentangling Module and a Causal Module that enforce decoupled generating factors and invariant causality through bi-level optimization. Across sinusoid regression, image classification, drug activity prediction, and pose estimation, MetaCRL consistently yields state-of-the-art results and reduces negative transfer, validating the importance of causal representation in meta-learning. The approach offers a practical pathway to more robust meta-learners by explicitly decoupling task-specific factors and enforcing invariance to distribution shifts during training.

Abstract

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as "Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.
Paper Structure (37 sections, 2 theorems, 20 equations, 13 figures, 9 tables)

This paper contains 37 sections, 2 theorems, 20 equations, 13 figures, 9 tables.

Key Result

Theorem 1

If the correlation between $Y_i$ and $Y_j$ is not equal to 0.5, the optimal classifier has non-zero weights for non-causal factors for each task. If the correlation between $Y_i$ and $Y_j$ equals 0.5 with limited training data, the optimal classifier also has non-zero weights for non-causal factors

Figures (13)

  • Figure 1: Knowledge transfer to a specific test task. For both positive knowledge transfer ($\mathcal{R}_{i,j}<1$) and negative knowledge transfer ($\mathcal{R}_{i,j}>1$), an exemplar task is shown. Here, we simply use the $\mathcal{R}_{i,j}$ threshold to classify the knowledge transfer as positive or negative. See Subsection \ref{['sec:3.2']} and Appendix F for more details.
  • Figure 2: Structural Causal Models (SCM) regarding two tasks $\tau_i$ and $\tau_j$, where $(X_i, Y_i)$ and $(X_j, Y_j)$ are the samples and corresponding labels of these tasks. The solid line means the true causal correlation, and the dotted line means the spurious correlation. (a) is constructed based on the ground-truth causal mechanism, while (b) can be viewed as the inverse process of the generating mechanism.
  • Figure 3: Ablation study, including (a) sinusoid regression, (b) pose prediction, (c) 5-way 1-shot miniImagenet, and (d) 20-way 1-shot Omniglot. The backbone is MAML. The red, blue, green, and orange bars represent the results of MetaCRL-$\mathcal{L}_{\rm{DM}}(f_{gr},\Xi)$, MetaCRL-$\mathcal{L}_{\rm{DM}}(\Xi)$, MetaCRL-$\mathcal{L}_{\rm{DM}}(f_{gr})$, and MetaCRL.
  • Figure 4: Knowledge transference after using MetaCRL.
  • Figure 5: Visualization of the similarity between causal factors.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2