Table of Contents
Fetching ...

Distributed Networked Multi-task Learning

Lingzhou Hong, Alfredo Garcia

TL;DR

This work develops a Distributed and Asynchronous algorithm for Multi-task Learning (DAMTL) that operates over a directed network partitioned into groups. By formulating a bi-level optimization with inner task-model estimation $\mathbf{W}$ and outer task-relationship precision $\boldsymbol{\Theta}$, the authors establish a two-timescale, asynchronous SGD framework with messengers to propagate cross-group information while preserving local computation; continuous-time approximations yield finite-time convergence guarantees for both inner and outer problems. The approach accounts for heterogeneous and correlated data streams and provides explicit conditions and step-size choices to ensure stability and bounded error in both the parameter estimates and the learned task-relationship matrix. Numerical experiments on synthetic Gaussian MRF temperature fields and real student-performance data demonstrate faster convergence and robustness of DAMTL compared to baseline approaches, illustrating its scalability and applicability to distributed, privacy-preserving learning scenarios.

Abstract

We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts.

Distributed Networked Multi-task Learning

TL;DR

This work develops a Distributed and Asynchronous algorithm for Multi-task Learning (DAMTL) that operates over a directed network partitioned into groups. By formulating a bi-level optimization with inner task-model estimation and outer task-relationship precision , the authors establish a two-timescale, asynchronous SGD framework with messengers to propagate cross-group information while preserving local computation; continuous-time approximations yield finite-time convergence guarantees for both inner and outer problems. The approach accounts for heterogeneous and correlated data streams and provides explicit conditions and step-size choices to ensure stability and bounded error in both the parameter estimates and the learned task-relationship matrix. Numerical experiments on synthetic Gaussian MRF temperature fields and real student-performance data demonstrate faster convergence and robustness of DAMTL compared to baseline approaches, illustrating its scalability and applicability to distributed, privacy-preserving learning scenarios.

Abstract

We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts.
Paper Structure (29 sections, 5 theorems, 108 equations, 3 figures, 2 algorithms)

This paper contains 29 sections, 5 theorems, 108 equations, 3 figures, 2 algorithms.

Key Result

Theorem 1

Let $\mathbf{w}_{i,t}$ evolve according to continuous time dynamics dtwo2. Then where $c=2\mu\kappa +2\delta_1\lambda_2\varpi$ with $\mu=\max \mu_i$ and $\varpi=\max_i \varpi_i$. The constants $A_i$ and $G_i$ describe the general Brownian terms and are defined in A and B. The function $h_2(V_t)$ is defined as where $h_1(V_t)$ is a function of $V_t$ to bound $\left\lVert\Theta_{j,t}\right\rVert_F

Figures (3)

  • Figure :
  • Figure :
  • Figure :

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 1
  • definition 1
  • Lemma 1