Table of Contents
Fetching ...

Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Mingwei Deng, Ville Kyrki, Dominik Baumann

TL;DR

This work tackles transfer learning for latent contextual bandits under covariate shift by rooting knowledge transfer in causal transportability. It shows that naive transfer of causal effects can cause negative transfer and then develops two linked strategies: a binary-posterior restoration with a closed-form solution and a high-dimensional proxy approach using a CEVAE-like variational autoencoder with a transport-aware objective. The resulting methods demonstrate improved sample efficiency and robust transfer across synthetic and semi-synthetic datasets, including IHDP and MNIST-based proxies, while avoiding degradation from misaligned context distributions. Overall, the approach provides a principled framework to identify and transfer invariant causal knowledge across environments, with strong implications for data-efficient decision-making under distributional shifts in bandit and, potentially, reinforcement learning settings.

Abstract

Transferring knowledge from one environment to another is an essential ability of intelligent systems. Nevertheless, when two environments are different, naively transferring all knowledge may deteriorate the performance, a phenomenon known as negative transfer. In this paper, we address this issue within the framework of multi-armed bandits from the perspective of causal inference. Specifically, we consider transfer learning in latent contextual bandits, where the actual context is hidden, but a potentially high-dimensional proxy is observable. We further consider a covariate shift in the context across environments. We show that naively transferring all knowledge for classical bandit algorithms in this setting led to negative transfer. We then leverage transportability theory from causal inference to develop algorithms that explicitly transfer effective knowledge for estimating the causal effects of interest in the target environment. Besides, we utilize variational autoencoders to approximate causal effects under the presence of a high-dimensional proxy. We test our algorithms on synthetic and semi-synthetic datasets, empirically demonstrating consistently improved learning efficiency across different proxies compared to baseline algorithms, showing the effectiveness of our causal framework in transferring knowledge.

Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

TL;DR

This work tackles transfer learning for latent contextual bandits under covariate shift by rooting knowledge transfer in causal transportability. It shows that naive transfer of causal effects can cause negative transfer and then develops two linked strategies: a binary-posterior restoration with a closed-form solution and a high-dimensional proxy approach using a CEVAE-like variational autoencoder with a transport-aware objective. The resulting methods demonstrate improved sample efficiency and robust transfer across synthetic and semi-synthetic datasets, including IHDP and MNIST-based proxies, while avoiding degradation from misaligned context distributions. Overall, the approach provides a principled framework to identify and transfer invariant causal knowledge across environments, with strong implications for data-efficient decision-making under distributional shifts in bandit and, potentially, reinforcement learning settings.

Abstract

Transferring knowledge from one environment to another is an essential ability of intelligent systems. Nevertheless, when two environments are different, naively transferring all knowledge may deteriorate the performance, a phenomenon known as negative transfer. In this paper, we address this issue within the framework of multi-armed bandits from the perspective of causal inference. Specifically, we consider transfer learning in latent contextual bandits, where the actual context is hidden, but a potentially high-dimensional proxy is observable. We further consider a covariate shift in the context across environments. We show that naively transferring all knowledge for classical bandit algorithms in this setting led to negative transfer. We then leverage transportability theory from causal inference to develop algorithms that explicitly transfer effective knowledge for estimating the causal effects of interest in the target environment. Besides, we utilize variational autoencoders to approximate causal effects under the presence of a high-dimensional proxy. We test our algorithms on synthetic and semi-synthetic datasets, empirically demonstrating consistently improved learning efficiency across different proxies compared to baseline algorithms, showing the effectiveness of our causal framework in transferring knowledge.

Paper Structure

This paper contains 29 sections, 9 theorems, 23 equations, 6 figures, 2 tables.

Key Result

Proposition 3

For an offline contextual bandit under covariate shift, the $z$-specific causal effect $p(y\vert z, do(x))$, where $z\in \mathcal{Z}$, $x \in \mathcal{X}$ and $y \in \mathbb{R}$, is directly transportable. The transport formula of $p^{\ast}(y\vert z, do(x))$ is

Figures (6)

  • Figure 1: Causal Graphs of the data-generating processes. Gray nodes are observable and white nodes are unobservable. Graphs in ($a$) and ($b$) depict causal graphs in the source domain and the target domain, respectively. The graph in ($c$) is the corresponding selection diagram for the two domains: the selection node $S$ points to $Z$ since $P_{Z}\neq P^{\ast}_{Z}$. The graph in ($d$) is a selection diagram encoding the domain discrepancy of an offline contextual bandit under a covariate shift.
  • Figure 2: Negative transfer for classic bandit algorithms. Fig. ($a$) shows negative transfer for binary model. Agents in this experiment utilize Thompson sampling thompson1933likelihood. Fig. ($b$) shows the negative transfer for the Linear Gaussian model with a scalar latent context $Z$ and 5-dimensional proxy $W$. Agents in this experiment utilize LinUCB li2010contextual.
  • Figure 3: Architecture of VAE with transportability.
  • Figure 4: Cumulative regret and the distribution comparison for the binary model. Each column shows the result of one setting, where context $Z$ follows different Bernoulli distributions. The top row shows the cumulative regret averaged over 100 simulations for Alg. \ref{['alg:binary']} and the baselines. The bottom row shows the convergence of the estimation (blue) to the true distribution of $Z$ in the target domain (dashed black) and the distribution of $Z$ in the source domain (solid black).
  • Figure 5: Total cumulative regrets and distributions comparison for IHDP and MNIST dataset. The first two columns are results for the IHDP dataset, and the last two are for the MNIST dataset. More results can be found in Appendix \ref{['Appd: more results']}.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 2: Transportability
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Remark 6
  • Corollary 7
  • Remark 8
  • Proposition
  • Proposition
  • Remark 9
  • ...and 3 more