Table of Contents
Fetching ...

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

Rana Muhammad Shahroz Khan, Pingzhi Li, Sukwon Yun, Zhenyu Wang, Shahriar Nirjon, Chau-Wai Wong, Tianlong Chen

TL;DR

PortLLM addresses the challenge of adapting to evolving large language models without repeatedly fine-tuning. It introduces a training-free mechanism that ports domain-specific knowledge via lightweight patches derived from LoRA, enabling seamless transfer to newer model iterations while keeping compute and memory usage low. The framework is theoretically justified by a lemma that the residual between patched and updated patches is negligible, and empirically validated across seven downstream tasks, four architectures, and multiple continued-pretraining datasets, demonstrating substantial zero-shot gains and major efficiency improvements such as up to $12.2\times$ GPU memory reduction. This approach offers a practical pathway for continual personalization of LLMs, reducing dependence on cloud-based training and enabling on-device, cost-effective deployment with broad applicability across domains like healthcare and science.

Abstract

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

TL;DR

PortLLM addresses the challenge of adapting to evolving large language models without repeatedly fine-tuning. It introduces a training-free mechanism that ports domain-specific knowledge via lightweight patches derived from LoRA, enabling seamless transfer to newer model iterations while keeping compute and memory usage low. The framework is theoretically justified by a lemma that the residual between patched and updated patches is negligible, and empirically validated across seven downstream tasks, four architectures, and multiple continued-pretraining datasets, demonstrating substantial zero-shot gains and major efficiency improvements such as up to GPU memory reduction. This approach offers a practical pathway for continual personalization of LLMs, reducing dependence on cloud-based training and enabling on-device, cost-effective deployment with broad applicability across domains like healthcare and science.

Abstract

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.

Paper Structure

This paper contains 28 sections, 17 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: The diagram illustrates the core components of PortLLM, a training-free framework to port personalized knowledge between evolving LLMs. Initially, a pretrained LLM is fine-tuned using LoRA. We transfer this task-specific knowledge without requiring the newer updated model to be fine-tuned again. This allows for continual performance improvements on downstream tasks without additional fine-tuning.
  • Figure 2: LLM's evolution & personalization cycle.