Table of Contents
Fetching ...

eFedLLM: Efficient LLM Inference Based on Federated Learning

Shengwen Ding, Chenhui Hu

TL;DR

An effective approach that enhances the operational efficiency and affordability of LLM inference by utilizing transformer-based federated learning (FL) with model-parallel distributed training, which efficiently distributes the computational loads and memory requirements across a network of participants.

Abstract

Large Language Models (LLMs) herald a transformative era in artificial intelligence (AI). However, the expansive scale of data and parameters of LLMs requires high-demand computational and memory resources, restricting their accessibility to a broader range of users and researchers. This paper introduces an effective approach that enhances the operational efficiency and affordability of LLM inference. By utilizing transformer-based federated learning (FL) with model-parallel distributed training, our model efficiently distributes the computational loads and memory requirements across a network of participants. This strategy permits users, especially those with limited resources to train state-of-the-art LLMs collaboratively. We also innovate an incentive mechanism within the FL framework, rewarding constructive contributions and filtering out malicious activities, thereby safeguarding the integrity and reliability of the training process. Concurrently, we leverage memory hierarchy strategies and Singular Value Decomposition (SVD) on weight matrices to boost computational and memory efficiencies further. Our results, derived from formulaic analyses and numerical calculations, demonstrate significant optimization of resource use and democratize access to cutting-edge LLMs, ensuring that a wide scale of users can both contribute to and benefit from these advanced models.

eFedLLM: Efficient LLM Inference Based on Federated Learning

TL;DR

An effective approach that enhances the operational efficiency and affordability of LLM inference by utilizing transformer-based federated learning (FL) with model-parallel distributed training, which efficiently distributes the computational loads and memory requirements across a network of participants.

Abstract

Large Language Models (LLMs) herald a transformative era in artificial intelligence (AI). However, the expansive scale of data and parameters of LLMs requires high-demand computational and memory resources, restricting their accessibility to a broader range of users and researchers. This paper introduces an effective approach that enhances the operational efficiency and affordability of LLM inference. By utilizing transformer-based federated learning (FL) with model-parallel distributed training, our model efficiently distributes the computational loads and memory requirements across a network of participants. This strategy permits users, especially those with limited resources to train state-of-the-art LLMs collaboratively. We also innovate an incentive mechanism within the FL framework, rewarding constructive contributions and filtering out malicious activities, thereby safeguarding the integrity and reliability of the training process. Concurrently, we leverage memory hierarchy strategies and Singular Value Decomposition (SVD) on weight matrices to boost computational and memory efficiencies further. Our results, derived from formulaic analyses and numerical calculations, demonstrate significant optimization of resource use and democratize access to cutting-edge LLMs, ensuring that a wide scale of users can both contribute to and benefit from these advanced models.

Paper Structure

This paper contains 14 sections, 2 theorems, 22 equations, 7 figures, 3 tables.

Key Result

theorem 1

Let $A_{m \times n}$ and $B_{n \times k}$ be matrices, and let $C = A \times B$ represent the matrix multiplication. Then, when deploying a Federated Learning model over a centralized model, the reduction in memory read time, $R_t$, can be expressed as:

Figures (7)

  • Figure 1: Two Types of Federated Learning Models
  • Figure 2: The Transformer Model in LLM
  • Figure 3: eFedLLM Framework Overview
  • Figure 4: Memory Hierarchy
  • Figure 5: The relationship between the compression ratio and the retained accuracy when preserving different numbers of singular values.
  • ...and 2 more figures

Theorems & Definitions (2)

  • theorem 1
  • proposition 1