Federated Offline Policy Learning

Aldo Gael Carranza; Susan Athey

Federated Offline Policy Learning

Aldo Gael Carranza, Susan Athey

TL;DR

A novel regret analysis is introduced that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source, characterized by expressions of source heterogeneity and distribution shift.

Abstract

We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data sources in offline policy learning.

Federated Offline Policy Learning

TL;DR

Abstract

Paper Structure (69 sections, 18 theorems, 141 equations, 4 figures, 3 algorithms)

This paper contains 69 sections, 18 theorems, 141 equations, 4 figures, 3 algorithms.

Introduction
Related Work
Offline Policy Learning
Federated Learning
Preliminaries
Setting
Objective
Data-Generating Processes
Data Assumptions
Approach
Nuisance Parameters
Policy Value Estimator
Optimization Objective
Regret Bounds
Complexity and Skewness
...and 54 more sections

Key Result

Theorem 5.3

Suppose Assumption ass:dgp, ass:LocalDataSizeScaling, and ass:FiniteSampleError hold. Then, with probability at least $1-\delta$, where and $c_1,c_2$ are universal constants.

Figures (4)

Figure 1: Empirical regret curves under homogeneous clients. All local regrets shown are shown for client $c=1$.
Figure 2: Empirical regret curves under heterogeneous clients. Top: $\lambda=\bar{n}$; Bottom: $\lambda=\bar{n}+\bar{\varepsilon}$. All local regrets shown are shown client $c=1$.
Figure 3: Empirical regret curves for simulation experiments. Local regrets are for client 2.
Figure : FedOPL: Server-Side

Theorems & Definitions (39)

Definition 3.1
Definition 3.2
Definition 4.1
Definition 5.1: Entropy integral
Definition 5.2: Skewness
Theorem 5.3: Global Regret Bound
Theorem 5.4: Local Regret Bound
Theorem 5.5: Local Distribution Shift Bound
Lemma 1.1: Hoeffding's inequality
Lemma 1.2: Talagrand's inequality
...and 29 more

Federated Offline Policy Learning

TL;DR

Abstract

Federated Offline Policy Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (39)