AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme
TL;DR
AdapterSwap introduces a parameter-efficient continual learning framework that partitions data into low-rank adapters (LoRAs) and composes them at inference to enforce data access-control and data removal guarantees while preserving prior knowledge. A retrieval model using SBERT embeddings, LDA, and a Gaussian Mixture Model selects relevant adapters during inference, enabling top-1/2/3 mixtures conditioned on access permissions. Empirical results across Falcon-7B, Gemma-7B, Llama-2-7B, and Mistral-7B show strong retrieval of appropriate adapters (top-1 69–81%, top-3 93–95%), efficient data removal (adapter retraining up to 80× cheaper than full retraining), and reduced forgetting compared to chronological fine-tuning. The approach offers practical benefits for organizations requiring dynamic data management in LLM deployments, while suggesting future work on improved retrieval, adapter mixing, and integration with RAG frameworks.
Abstract
Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with guarantees that associated knowledge cannot be recalled. We wish to satisfy these requirements while at the same time ensuring a model does not forget old information when new data becomes available. To address these issues, we introduce AdapterSwap, a training and inference scheme that organizes knowledge from a data collection into a set of low-rank adapters, which are dynamically composed during inference. Our experiments demonstrate AdapterSwap's ability to support efficient continual learning, while also enabling organizations to have fine-grained control over data access and deletion.
