Robust Transfer Learning with Side Information

Akram S. Awad; Shihab Ahmed; Yue Wang; George K. Atia

Robust Transfer Learning with Side Information

Akram S. Awad, Shihab Ahmed, Yue Wang, George K. Atia

TL;DR

A framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics is proposed.

Abstract

Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches require enlarging the uncertainty set under large shifts, which leads to overly conservative and pessimistic policies. In this paper, we propose a framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. Error bounds and convergence results are established for both robust and non-robust value functions. Moreover, we provide a finite-sample guarantee on the learned robust policy and analyze the robust sub-optimality gap. Under mild low-dimensional structure on the transition model, the side information reduces this gap and improves sample efficiency. We assess the performance of our approach across OpenAI Gym environments and classic control problems, consistently demonstrating superior target-domain performance over state-of-the-art robust and non-robust baselines.

Robust Transfer Learning with Side Information

TL;DR

Abstract

Paper Structure (41 sections, 12 theorems, 84 equations, 20 figures, 3 tables, 2 algorithms)

This paper contains 41 sections, 12 theorems, 84 equations, 20 figures, 3 tables, 2 algorithms.

Introduction
Preliminaries and Problem Setup
Problem setup.
Main Approach
Information-Based Estimation
Theoretical Analysis
IBE for Transfer
Suboptimality Gap and Finite-Sample Guarantees
Related Work
Numerical Experiments
Conclusion
Definitions
Cramér–Rao Bounds with Side Information
Value-Aware Side Information.
Deriving Side Information from System Knowledge
...and 26 more sections

Key Result

Theorem 1

For rewards in $[0,1]$ and any $\gamma\in(0,1)$,

Figures (20)

Figure 1: (a) Environment shift: The source and target domain environments are relatively distant. (b) Over-conservative case: the source uncertainty set's radius is enlarged to include the target domain, which leads to an overly conservative policy. (c) Our approach: We construct the uncertainty set around the estimated target dynamics, which are closer to the true target dynamics and therefore the set requires a smaller radius.
Figure 2: Target domain performance for the non-robust setting as a function of sample size for CartPole. The legend is shared between Figures \ref{['fig: Non-robust scenario']} and \ref{['fig: robust scenario']}.
Figure 3: Target domain performance for the robust setting as a function of sample size for CartPole.
Figure 4: Suboptimality gap as a function of sample size $(N)$ (log-log scale) in the CartPole environment for LDS-IBE, for non-robust (left) and robust (right) scenarios.
Figure 5: A depiction OpenAI Gym toy text environments:(a) Frozen Lake environment (b) Taxi environment(c) Cliff Walking environment
...and 15 more figures

Theorems & Definitions (27)

Remark 1
Theorem 1: Training error
Theorem 2: Evaluation error
Corollary 3: Consistency
Proposition 4: TV-consistency of IBE
Lemma 6: Finite-sample radius for LDS--IBE
Theorem 7: Suboptimality gap
Definition 8: Total variation distance LevinPeresWilmer2017
Definition 9: Wasserstein--1 (Earth Mover's) distance villani2008optimal
Definition 10: Value-Aware 1-Wasserstein distance
...and 17 more

Robust Transfer Learning with Side Information

TL;DR

Abstract

Robust Transfer Learning with Side Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (27)