PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning
Romain Cosentino
TL;DR
PLATE addresses catastrophic forgetting in data-free continual learning of pretrained foundation models by exploiting geometric redundancy to protect old-task behavior while concentrating plasticity on redundant channels. It parameterizes updates as $\Delta W = B A Q^\top$, with $B$ selecting redundant output channels and $Q$ spanning a weight-derived low-energy input subspace, both computed from frozen weights. The approach yields a tunable retention-plasticity trade-off through two knobs, $r$ (number of trainable output channels) and $\tau$ (input-energy threshold), and demonstrates competitive or superior retention compared with LoRA across language, vision, and synthetic benchmarks, including out-of-distribution LLM specialization. This data-free method enables scalable, geometry-aware continual adaptation of large models by offering explicit control over forgetting and task performance, with practical implications for foundation-model deployment where old-task data are unavailable.
Abstract
We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $ΔW = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.
