Table of Contents
Fetching ...

Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs

Valentina Zantedeschi, Aurélien Bellet, Marc Tommasi

TL;DR

The paper proposes a fully decentralized framework for learning personalized models and an evolving collaboration graph without a central server. It introduces a boosting-based method to learn nonlinear personalized models with communication restricted to neighbors, and a proximal-coordinate descent approach to learn a sparse collaboration graph via peer sampling. An alternating optimization scheme alternates between updating models and updating the graph, with theoretical guarantees: $O(1/t)$ convergence for the model updates and fast convergence for the graph updates under strongly convex regularizers. Empirical results on synthetic and real datasets show improved accuracy and favorable communication costs compared to centralized and decentralized baselines, demonstrating scalability and privacy-friendly collaboration.

Abstract

We consider the fully decentralized machine learning scenario where many users with personal datasets collaborate to learn models through local peer-to-peer exchanges, without a central coordinator. We propose to train personalized models that leverage a collaboration graph describing the relationships between user personal tasks, which we learn jointly with the models. Our fully decentralized optimization procedure alternates between training nonlinear models given the graph in a greedy boosting manner, and updating the collaboration graph (with controlled sparsity) given the models. Throughout the process, users exchange messages only with a small number of peers (their direct neighbors when updating the models, and a few random users when updating the graph), ensuring that the procedure naturally scales with the number of users. Overall, our approach is communication-efficient and avoids exchanging personal data. We provide an extensive analysis of the convergence rate, memory and communication complexity of our approach, and demonstrate its benefits compared to competing techniques on synthetic and real datasets.

Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs

TL;DR

The paper proposes a fully decentralized framework for learning personalized models and an evolving collaboration graph without a central server. It introduces a boosting-based method to learn nonlinear personalized models with communication restricted to neighbors, and a proximal-coordinate descent approach to learn a sparse collaboration graph via peer sampling. An alternating optimization scheme alternates between updating models and updating the graph, with theoretical guarantees: convergence for the model updates and fast convergence for the graph updates under strongly convex regularizers. Empirical results on synthetic and real datasets show improved accuracy and favorable communication costs compared to centralized and decentralized baselines, demonstrating scalability and privacy-friendly collaboration.

Abstract

We consider the fully decentralized machine learning scenario where many users with personal datasets collaborate to learn models through local peer-to-peer exchanges, without a central coordinator. We propose to train personalized models that leverage a collaboration graph describing the relationships between user personal tasks, which we learn jointly with the models. Our fully decentralized optimization procedure alternates between training nonlinear models given the graph in a greedy boosting manner, and updating the collaboration graph (with controlled sparsity) given the models. Throughout the process, users exchange messages only with a small number of peers (their direct neighbors when updating the models, and a few random users when updating the graph), ensuring that the procedure naturally scales with the number of users. Overall, our approach is communication-efficient and avoids exchanging personal data. We provide an extensive analysis of the convergence rate, memory and communication complexity of our approach, and demonstrate its benefits compared to competing techniques on synthetic and real datasets.

Paper Structure

This paper contains 31 sections, 6 theorems, 51 equations, 7 figures, 3 tables.

Key Result

Theorem 1

Our decentralized Frank-Wolfe algorithm takes at most $6 K (C^\otimes_f + p_0)/\varepsilon$ iterations to find an approximate solution $\alpha$ that satisfies, in expectation, $f(\alpha) - f(\alpha^*) \leq \varepsilon$, where $C^\otimes_f\leq4 \beta^2 \sum_{k=1}^K d_k(w)( c_k \|A_k\|^2 + \mu_1)$ an

Figures (7)

  • Figure 1: Results on the Moons dataset. Top: Training and test accuracy w.r.t. iterations (we display the performance of non-collaborative baselines at convergence with a straight line). Global-lin is off limits at $\sim$50% accuracy. Bottom: Average number of neighbors w.r.t. iterations for Dada-Learned.
  • Figure 2: Graph learned on Moons. Top: Graph weights for the oracle and learned graph (with users grouped by cluster). Bottom: Visualization of the graph. The node size is proportional to the confidence $c_k$ and the color reflects the relative value of the local loss (greener = smaller loss). Nodes are labeled with their rotation angle, and a darker edge color indicates a higher weight.
  • Figure 3: Impact of $\kappa$ on the convergence rate and the communication cost for learning the graph on Moons.
  • Figure 4: Average test accuracies with respect to the number of training points of the local sets.
  • Figure 5: Average test accuracies with respect to the communication cost (# bits).
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Remark 1: Other loss functions
  • Theorem 2
  • Remark 2: Reducing the number of variables
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 2 more