Table of Contents
Fetching ...

Federated Offline Policy Learning

Aldo Gael Carranza, Susan Athey

TL;DR

A novel regret analysis is introduced that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source, characterized by expressions of source heterogeneity and distribution shift.

Abstract

We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data sources in offline policy learning.

Federated Offline Policy Learning

TL;DR

A novel regret analysis is introduced that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source, characterized by expressions of source heterogeneity and distribution shift.

Abstract

We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data sources in offline policy learning.
Paper Structure (69 sections, 18 theorems, 141 equations, 4 figures, 3 algorithms)

This paper contains 69 sections, 18 theorems, 141 equations, 4 figures, 3 algorithms.

Key Result

Theorem 5.3

Suppose Assumption ass:dgp, ass:LocalDataSizeScaling, and ass:FiniteSampleError hold. Then, with probability at least $1-\delta$, where and $c_1,c_2$ are universal constants.

Figures (4)

  • Figure 1: Empirical regret curves under homogeneous clients. All local regrets shown are shown for client $c=1$.
  • Figure 2: Empirical regret curves under heterogeneous clients. Top: $\lambda=\bar{n}$; Bottom: $\lambda=\bar{n}+\bar{\varepsilon}$. All local regrets shown are shown client $c=1$.
  • Figure 3: Empirical regret curves for simulation experiments. Local regrets are for client 2.
  • Figure : FedOPL: Server-Side

Theorems & Definitions (39)

  • Definition 3.1
  • Definition 3.2
  • Definition 4.1
  • Definition 5.1: Entropy integral
  • Definition 5.2: Skewness
  • Theorem 5.3: Global Regret Bound
  • Theorem 5.4: Local Regret Bound
  • Theorem 5.5: Local Distribution Shift Bound
  • Lemma 1.1: Hoeffding's inequality
  • Lemma 1.2: Talagrand's inequality
  • ...and 29 more