Table of Contents
Fetching ...

LEGO: Language Model Building Blocks

Shrenik Bhansali, Alwin Jin, Tyler Lizzo, Larry Heck

TL;DR

LEGO is proposed, a novel technique to extract SLMs from an LLM and recombine them, and its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness is experimentally demonstrated.

Abstract

Large language models (LLMs) are essential in natural language processing (NLP) but are costly in data collection, pre-training, fine-tuning, and inference. Task-specific small language models (SLMs) offer a cheaper alternative but lack robustness and generalization. This paper proposes LEGO, a novel technique to extract SLMs from an LLM and recombine them. Using state-of-the-art LLM pruning strategies, we can create task- and user-specific SLM building blocks that are efficient for fine-tuning and inference while also preserving user data privacy. LEGO utilizes Federated Learning and a novel aggregation scheme for the LLM reconstruction, maintaining robustness without high costs and preserving user data privacy. We experimentally demonstrate the versatility of LEGO, showing its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness.

LEGO: Language Model Building Blocks

TL;DR

LEGO is proposed, a novel technique to extract SLMs from an LLM and recombine them, and its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness is experimentally demonstrated.

Abstract

Large language models (LLMs) are essential in natural language processing (NLP) but are costly in data collection, pre-training, fine-tuning, and inference. Task-specific small language models (SLMs) offer a cheaper alternative but lack robustness and generalization. This paper proposes LEGO, a novel technique to extract SLMs from an LLM and recombine them. Using state-of-the-art LLM pruning strategies, we can create task- and user-specific SLM building blocks that are efficient for fine-tuning and inference while also preserving user data privacy. LEGO utilizes Federated Learning and a novel aggregation scheme for the LLM reconstruction, maintaining robustness without high costs and preserving user data privacy. We experimentally demonstrate the versatility of LEGO, showing its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness.

Paper Structure

This paper contains 19 sections, 1 equation, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: The LEGO workflow. An LLM is first pruned to create SLMs, then each SLM is assigned to a client. Each client then fine-tunes its SLM on its local data. After fine-tuning, the models are aggregated to create a global update. The global update is then applied to all the client SLMs as well as a global LLM. Eventually, after enough updates, a final global LLM is derived.
  • Figure 2: A symbolic representation of our heterogeneous aggregation method.
  • Figure 3: The performance of clients after each global update.
  • Figure 4: Combining differently shaped building blocks to create a larger block
  • Figure 5: The accuracy of LEGO components on HellaSwag after aggregation with one omission. The solid blue line is the accuracy of the fine-tuned model, and the dotted black line is the globally updated performance, as listed in Table \ref{['table:results2']}.
  • ...and 1 more figures