Table of Contents
Fetching ...

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

Xuwei Yang, Anastasis Kratsios, Florian Krach, Matheus Grasselli, Aurelien Lucchi

TL;DR

A nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space and an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$ across all training sets, is investigated.

Abstract

We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_θ$. Our objective is to minimize the cumulative deviation of the generated parameters $\{θ_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $θ^\star_{1},\ldots,θ^\star_N$ obtained for each dataset, while respecting the loss function for the model $f_{θ(T)}$ produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model $f_θ$ is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, where $\bar{N}$ is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

TL;DR

A nearly regret-optimal heuristic that runs with fewer elementary operations, where is the dimension of the parameter space and an adversary which perturbs training pairs by at-most across all training sets, is investigated.

Abstract

We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets for the same learning model . Our objective is to minimize the cumulative deviation of the generated parameters across all iterations from the specialized parameters obtained for each dataset, while respecting the loss function for the model produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with fewer elementary operations, where is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs training pairs by at-most , across all training sets, cannot reduce the regret-optimal algorithm's regret by more than , where is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.
Paper Structure (7 sections, 1 theorem, 20 equations, 2 algorithms)

This paper contains 7 sections, 1 theorem, 20 equations, 2 algorithms.

Key Result

Theorem 1

Fix $L\ge 0$, let $\mathcal{F}$ be a non-empty family of $L$-Lipschitz functions mapping $\mathbb{R}^d$ to $\mathbb{R}^D$. There exists a constant $C\ge 1$ (depending only on $d+D$ and on $\mathcal{Z}$) such that: for every $0< \delta \le 1$, each $\eta\ge 0$, $\gamma>0$ and every $w\in \Delta_N$, t where $\bar{L}\stackrel{\hbox{\upshape\tiny def.}}{=} L_{\ell} \max\{1,L\}$, $\operatorname{KL}(\ma

Theorems & Definitions (2)

  • Remark 1
  • Theorem 1: Non-Asymptotic Transfer Learning Guarantee (General Lipschitz Learners)