Table of Contents
Fetching ...

Robust Transfer Learning with Side Information

Akram S. Awad, Shihab Ahmed, Yue Wang, George K. Atia

TL;DR

A framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics is proposed.

Abstract

Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches require enlarging the uncertainty set under large shifts, which leads to overly conservative and pessimistic policies. In this paper, we propose a framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. Error bounds and convergence results are established for both robust and non-robust value functions. Moreover, we provide a finite-sample guarantee on the learned robust policy and analyze the robust sub-optimality gap. Under mild low-dimensional structure on the transition model, the side information reduces this gap and improves sample efficiency. We assess the performance of our approach across OpenAI Gym environments and classic control problems, consistently demonstrating superior target-domain performance over state-of-the-art robust and non-robust baselines.

Robust Transfer Learning with Side Information

TL;DR

A framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics is proposed.

Abstract

Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches require enlarging the uncertainty set under large shifts, which leads to overly conservative and pessimistic policies. In this paper, we propose a framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. Error bounds and convergence results are established for both robust and non-robust value functions. Moreover, we provide a finite-sample guarantee on the learned robust policy and analyze the robust sub-optimality gap. Under mild low-dimensional structure on the transition model, the side information reduces this gap and improves sample efficiency. We assess the performance of our approach across OpenAI Gym environments and classic control problems, consistently demonstrating superior target-domain performance over state-of-the-art robust and non-robust baselines.
Paper Structure (41 sections, 12 theorems, 84 equations, 20 figures, 3 tables, 2 algorithms)

This paper contains 41 sections, 12 theorems, 84 equations, 20 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

For rewards in $[0,1]$ and any $\gamma\in(0,1)$,

Figures (20)

  • Figure 1: (a) Environment shift: The source and target domain environments are relatively distant. (b) Over-conservative case: the source uncertainty set's radius is enlarged to include the target domain, which leads to an overly conservative policy. (c) Our approach: We construct the uncertainty set around the estimated target dynamics, which are closer to the true target dynamics and therefore the set requires a smaller radius.
  • Figure 2: Target domain performance for the non-robust setting as a function of sample size for CartPole. The legend is shared between Figures \ref{['fig: Non-robust scenario']} and \ref{['fig: robust scenario']}.
  • Figure 3: Target domain performance for the robust setting as a function of sample size for CartPole.
  • Figure 4: Suboptimality gap as a function of sample size $(N)$ (log-log scale) in the CartPole environment for LDS-IBE, for non-robust (left) and robust (right) scenarios.
  • Figure 5: A depiction OpenAI Gym toy text environments:(a) Frozen Lake environment (b) Taxi environment(c) Cliff Walking environment
  • ...and 15 more figures

Theorems & Definitions (27)

  • Remark 1
  • Theorem 1: Training error
  • Theorem 2: Evaluation error
  • Corollary 3: Consistency
  • Proposition 4: TV-consistency of IBE
  • Lemma 6: Finite-sample radius for LDS--IBE
  • Theorem 7: Suboptimality gap
  • Definition 8: Total variation distance LevinPeresWilmer2017
  • Definition 9: Wasserstein--1 (Earth Mover's) distance villani2008optimal
  • Definition 10: Value-Aware 1-Wasserstein distance
  • ...and 17 more