Table of Contents
Fetching ...

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme

TL;DR

AdapterSwap introduces a parameter-efficient continual learning framework that partitions data into low-rank adapters (LoRAs) and composes them at inference to enforce data access-control and data removal guarantees while preserving prior knowledge. A retrieval model using SBERT embeddings, LDA, and a Gaussian Mixture Model selects relevant adapters during inference, enabling top-1/2/3 mixtures conditioned on access permissions. Empirical results across Falcon-7B, Gemma-7B, Llama-2-7B, and Mistral-7B show strong retrieval of appropriate adapters (top-1 69–81%, top-3 93–95%), efficient data removal (adapter retraining up to 80× cheaper than full retraining), and reduced forgetting compared to chronological fine-tuning. The approach offers practical benefits for organizations requiring dynamic data management in LLM deployments, while suggesting future work on improved retrieval, adapter mixing, and integration with RAG frameworks.

Abstract

Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with guarantees that associated knowledge cannot be recalled. We wish to satisfy these requirements while at the same time ensuring a model does not forget old information when new data becomes available. To address these issues, we introduce AdapterSwap, a training and inference scheme that organizes knowledge from a data collection into a set of low-rank adapters, which are dynamically composed during inference. Our experiments demonstrate AdapterSwap's ability to support efficient continual learning, while also enabling organizations to have fine-grained control over data access and deletion.

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

TL;DR

AdapterSwap introduces a parameter-efficient continual learning framework that partitions data into low-rank adapters (LoRAs) and composes them at inference to enforce data access-control and data removal guarantees while preserving prior knowledge. A retrieval model using SBERT embeddings, LDA, and a Gaussian Mixture Model selects relevant adapters during inference, enabling top-1/2/3 mixtures conditioned on access permissions. Empirical results across Falcon-7B, Gemma-7B, Llama-2-7B, and Mistral-7B show strong retrieval of appropriate adapters (top-1 69–81%, top-3 93–95%), efficient data removal (adapter retraining up to 80× cheaper than full retraining), and reduced forgetting compared to chronological fine-tuning. The approach offers practical benefits for organizations requiring dynamic data management in LLM deployments, while suggesting future work on improved retrieval, adapter mixing, and integration with RAG frameworks.

Abstract

Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with guarantees that associated knowledge cannot be recalled. We wish to satisfy these requirements while at the same time ensuring a model does not forget old information when new data becomes available. To address these issues, we introduce AdapterSwap, a training and inference scheme that organizes knowledge from a data collection into a set of low-rank adapters, which are dynamically composed during inference. Our experiments demonstrate AdapterSwap's ability to support efficient continual learning, while also enabling organizations to have fine-grained control over data access and deletion.
Paper Structure (26 sections, 4 figures, 4 tables)

This paper contains 26 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Motivating application of AdapterSwap. A mixture model selects the most relevant adapters to each users' query with the appropriate access-controls (indicated by shapes). Selected adapters are then combined and applied to a base model to produce personalized responses for each user.
  • Figure 2: AdapterSwap overview. Individual adapters are trained on partioned access-control groups. A retriever model is fit using LDA and a GMM over SBERT representations. If data is removed only the impacted adapter requires retraining.
  • Figure 3: (a) Average training time for a single adapter given the data partition size. (b) Total number of adapters needed per data partition size for 1,064,304 documents divided equally among partitions. (c) Observed perplexity per partition size when partitioning dataset by domain. (d) Total GPU hours required to train all adapters using an equal partition size per adapter.
  • Figure 4: Perplexity of the first month of data measured after month by month training. FT indicates chronological fine-tuning as data becomes available. RT indicates retraining with new data and all preceding data. AS indicates AdapterSwap performance using the first month adapter.