Bayesian Meta-Learning with Expert Feedback for Task-Shift Adaptation through Causal Embeddings

Lotta Mäkinen; Jorge Loría; Samuel Kaski

Bayesian Meta-Learning with Expert Feedback for Task-Shift Adaptation through Causal Embeddings

Lotta Mäkinen, Jorge Loría, Samuel Kaski

TL;DR

A causally-aware Bayesian meta-learning method, by conditioning task-specific priors on precomputed latent causal task embeddings, enabling transfer based on mechanistic similarity rather than spurious correlations, is proposed, enabling transfer based on mechanistic similarity rather than spurious correlations.

Abstract

Meta-learning methods perform well on new within-distribution tasks but often fail when adapting to out-of-distribution target tasks, where transfer from source tasks can induce negative transfer. We propose a causally-aware Bayesian meta-learning method, by conditioning task-specific priors on precomputed latent causal task embeddings, enabling transfer based on mechanistic similarity rather than spurious correlations. Our approach explicitly considers realistic deployment settings where access to target-task data is limited, and adaptation relies on noisy (expert-provided) pairwise judgments of causal similarity between source and target tasks. We provide a theoretical analysis showing that conditioning on causal embeddings controls prior mismatch and mitigates negative transfer under task shift. Empirically, we demonstrate reductions in negative transfer and improved out-of-distribution adaptation in both controlled simulations and a large-scale real-world clinical prediction setting for cross-disease transfer, where causal embeddings align with underlying clinical mechanisms.

Bayesian Meta-Learning with Expert Feedback for Task-Shift Adaptation through Causal Embeddings

TL;DR

Abstract

Paper Structure (52 sections, 2 theorems, 51 equations, 8 figures, 8 tables, 4 algorithms)

This paper contains 52 sections, 2 theorems, 51 equations, 8 figures, 8 tables, 4 algorithms.

Introduction
Preliminaries
Bayesian Meta-Learning
Structural Causal Models
Related Work
Meta-learning under task-level distribution shift.
Causal inference and task heterogeneity.
Expert knowledge in learning systems.
Incorporating Causal Embeddings to a Bayesian Meta-Learning Model
Causal Embeddings
Causally-Aware Bayesian Meta-Learning
Expert-Guided Inference of Target Task Embeddings
Error Decomposition and Negative Transfer Under Task-Level Shift
Evaluation under Task-level Shifts
Synthetic Setting
...and 37 more sections

Key Result

Proposition 4

Assume the loss is bounded, $\ell(\phi;x,y)\in[0,M]$. If $z_1$ and $z_2$ are $\varepsilon$-similar, then for any task $t$, For task $t'$, the mismatch satisfies where $\|W\|$ is the spectral norm of the embedding weight matrix, ${\varepsilon_{\mathrm{OOD}} = \|z_{t'}-\bar{z}\|}, {\varepsilon_{\mathrm{causal}} = \|\tilde{z}_{t'} - z_{t'}\|}$, and ${\varepsilon_{\mathrm{expert}} = \|\hat{z}_{t'} -

Figures (8)

Figure 1: Illustration of the causal embedding space (left) and the mapping to the parameter space (right). Each task $t$ has an embedding $z_t$, encoding its causal structure. The linear map $z \mapsto \theta + Wz$ transforms embeddings into task-specific priors, enabling tasks with similar causal mechanisms to have similar priors.
Figure 2: Performance of meta-learning models under increasing task-shift for Experiment 1. a) AUROC as a function of the distribution shift $\varepsilon_\mathrm{OOD}$. b) Change in log loss relative to no transfer baseline (BNN), where negative values indicate positive transfer and positive values negative transfer. Error bars denote standard deviation across 30 runs.
Figure 3: Method comparison with expert inferred embeddings for Experiment 2 across 30 runs. a) Average AUROC across increasing distribution shift $\varepsilon_\mathrm{OOD}$ levels for our method against baselines, error bars denote SD. b) Effect of expert query budget on prediction performance, for a single task (${\varepsilon=4.0}$).
Figure 4: Performance of causally-aware meta-learning models under increasing noise in the causal task embeddings.
Figure 5: Expert-noise sensitivity under active querying. RMSE as a function of the number of expert queries at different noise levels $\tau_\text{expert}$ for multiple target tasks ordered with increasing distribution shift. Lower $\tau_\text{expert}$ corresponds to noisier expert feedback. The final panel aggregates results across all target tasks.
...and 3 more figures

Theorems & Definitions (9)

Definition 1
Definition 2
Proposition 4
Theorem 5
proof
proof
Remark 6
Definition 7
proof

Bayesian Meta-Learning with Expert Feedback for Task-Shift Adaptation through Causal Embeddings

TL;DR

Abstract

Bayesian Meta-Learning with Expert Feedback for Task-Shift Adaptation through Causal Embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)