Table of Contents
Fetching ...

LLMs as On-demand Customizable Service

Souvika Sarkar, Mohammad Fakhruddin Babar, Monowar Hasan, Shubhra Kanti Karmaker

TL;DR

This work tackles the challenge of making large language models accessible across diverse hardware by proposing a hierarchical, distributed LLM architecture that supports on-demand, customizable services. The approach organizes models into layered tiers (Master LLM, language-, domain-, and sub-domain-specific models) and leverages distillation, continual learning, and bidirectional Upstream/Downstream knowledge transfer to synchronize knowledge across layers. A healthcare use case illustrates how a resource-constrained clinician can select, deploy, and continually update an appropriate model on local devices, while updates propagate to peer models. The paper also discusses deployment challenges such as model selection, update coordination, catastrophic forgetting, update timing, and security, and advocates an open-source implementation to democratize access to AI capabilities across platforms.

Abstract

Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.

LLMs as On-demand Customizable Service

TL;DR

This work tackles the challenge of making large language models accessible across diverse hardware by proposing a hierarchical, distributed LLM architecture that supports on-demand, customizable services. The approach organizes models into layered tiers (Master LLM, language-, domain-, and sub-domain-specific models) and leverages distillation, continual learning, and bidirectional Upstream/Downstream knowledge transfer to synchronize knowledge across layers. A healthcare use case illustrates how a resource-constrained clinician can select, deploy, and continually update an appropriate model on local devices, while updates propagate to peer models. The paper also discusses deployment challenges such as model selection, update coordination, catastrophic forgetting, update timing, and security, and advocates an open-source implementation to democratize access to AI capabilities across platforms.

Abstract

Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.
Paper Structure (6 sections, 2 figures)

This paper contains 6 sections, 2 figures.

Figures (2)

  • Figure 1: A Use Case - Leveraging Hierarchical Language Model Architecture as On-Demand Service.
  • Figure 2: High-level schematic diagram of a multi-tier distributed LLM architecture.