Table of Contents
Fetching ...

Federating to Grow Transformers with Constrained Resources without Model Sharing

Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu

TL;DR

Fed-Grow introduces a privacy-preserving federated framework to grow transformers from heterogeneous pre-trained small models without sharing data or small models. It hinges on the Dual-LiGO architecture, comprising Local-LiGO to standardize client models and Global-LiGO to exchange a shared growth operator, thereby accelerating training and reducing resource use. Experiments across NLP and CV tasks show improved accuracy/precision, reduced variance across clients, and substantial reductions in trainable parameters and communication compared with scratch-based federated training and independent expansion. The work enables resource-constrained users to leverage large transformers in distributed settings while preserving data and model privacy.

Abstract

The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to help participants expand their pre-trained small models to a transformer. In Dual-LiGO, the Local-LiGO part is used to address the heterogeneity problem caused by the various pre-trained models, and the Global-LiGO part is shared to exchange the implicit knowledge from the pre-trained models, local data, and training process of participants. Instead of model sharing, only sharing the Global-LiGO strengthens the privacy of our approach. Compared with several state-of-the-art methods in simulation, our approach has higher accuracy, better precision, and lower resource consumption on computations and communications. To the best of our knowledge, most of the previous model-scaling works are centralized, and our work is the first one that cooperatively grows a transformer from multiple pre-trained heterogeneous models with the user privacy protected in terms of local data and models. We hope that our approach can extend the transformers to the broadly distributed scenarios and encourage more resource-constrained users to enjoy the bonus taken by the large-scale transformers.

Federating to Grow Transformers with Constrained Resources without Model Sharing

TL;DR

Fed-Grow introduces a privacy-preserving federated framework to grow transformers from heterogeneous pre-trained small models without sharing data or small models. It hinges on the Dual-LiGO architecture, comprising Local-LiGO to standardize client models and Global-LiGO to exchange a shared growth operator, thereby accelerating training and reducing resource use. Experiments across NLP and CV tasks show improved accuracy/precision, reduced variance across clients, and substantial reductions in trainable parameters and communication compared with scratch-based federated training and independent expansion. The work enables resource-constrained users to leverage large transformers in distributed settings while preserving data and model privacy.

Abstract

The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to help participants expand their pre-trained small models to a transformer. In Dual-LiGO, the Local-LiGO part is used to address the heterogeneity problem caused by the various pre-trained models, and the Global-LiGO part is shared to exchange the implicit knowledge from the pre-trained models, local data, and training process of participants. Instead of model sharing, only sharing the Global-LiGO strengthens the privacy of our approach. Compared with several state-of-the-art methods in simulation, our approach has higher accuracy, better precision, and lower resource consumption on computations and communications. To the best of our knowledge, most of the previous model-scaling works are centralized, and our work is the first one that cooperatively grows a transformer from multiple pre-trained heterogeneous models with the user privacy protected in terms of local data and models. We hope that our approach can extend the transformers to the broadly distributed scenarios and encourage more resource-constrained users to enjoy the bonus taken by the large-scale transformers.
Paper Structure (25 sections, 7 equations, 7 figures, 11 tables, 2 algorithms)

This paper contains 25 sections, 7 equations, 7 figures, 11 tables, 2 algorithms.

Figures (7)

  • Figure 1: Illustration of different methods for efficient model training. The upper left subfigure shows the conventional methods of training a large model from scratch. The lower left subfigure shows the model reusing methods of exploiting a small pre-trained model. The upper right figure shows the conventional federated learning (FL) approach. The lower right subfigure shows our proposed framework: Fed-Grow with Dual-LiGO.
  • Figure 2: Workflow of Dual-LiGO.
  • Figure 3: Comparison of Dual-LiGO (Agg) and NoAgg on six datasets under IID and non-IID settings with 10 clients. The subfigures show the mean accuracy (or precision) of the methods on each dataset. Each subfigure shows four lines: IID_AGG, IID_NoAgg, NIID_AGG, NIID_NoAgg, which represent the test accuracy (or precision) of the methods under different settings.
  • Figure 4: Comparison of Dual-LiGO (Agg) and NoAgg on six datasets under IID and non-IID settings with 20 clients.
  • Figure 5: Comparison of Dual-LiGO (Agg) and NoAgg on agnews under IID and non-IID settings with 30-50 clients.
  • ...and 2 more figures