LLMs as On-demand Customizable Service
Souvika Sarkar, Mohammad Fakhruddin Babar, Monowar Hasan, Shubhra Kanti Karmaker
TL;DR
This work tackles the challenge of making large language models accessible across diverse hardware by proposing a hierarchical, distributed LLM architecture that supports on-demand, customizable services. The approach organizes models into layered tiers (Master LLM, language-, domain-, and sub-domain-specific models) and leverages distillation, continual learning, and bidirectional Upstream/Downstream knowledge transfer to synchronize knowledge across layers. A healthcare use case illustrates how a resource-constrained clinician can select, deploy, and continually update an appropriate model on local devices, while updates propagate to peer models. The paper also discusses deployment challenges such as model selection, update coordination, catastrophic forgetting, update timing, and security, and advocates an open-source implementation to democratize access to AI capabilities across platforms.
Abstract
Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.
