Table of Contents
Fetching ...

RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks

Nazia Tasnim, Bryan A. Plummer

TL;DR

RECAST introduces a drastically parameter-efficient approach to incremental learning by reparameterizing pretrained weights through shared template banks and module-specific coefficients, and a Neural Mimicry pipeline to reconstruct those weights without extensive pretraining. The framework achieves per-task updates with fewer than 50 trainable parameters and is architecture-agnostic, demonstrating consistent improvements over state-of-the-art across CNN and ViT backbones on six datasets. Key contributions include the Weight Decomposition mechanism, the Neural Mimicry training objective, and a scalable Parametric Scaling Strategy that yields substantial memory savings while preserving expressiveness. The work shows that combining template diversity with minimal coefficient updates can robustly adapt to sequential tasks and can complement existing adapters for additional gains.

Abstract

Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the entire network, they still require tens to hundreds of thousands of task-specific trainable parameters even for relatively small networks, making it challenging to operate on resource-constrained environments with high communication costs like edge devices or mobile phones. Thus, we propose Reparameterized, Compact weight Adaptation for Sequential Tasks (RECAST), a novel method that dramatically reduces task-specific trainable parameters to fewer than 50 - several orders of magnitude less than competing methods like LoRA. RECAST accomplishes this efficiency by learning to decompose layer weights into a soft parameter-sharing framework consisting of shared weight templates and very few module-specific scaling factors or coefficients. This soft parameter-sharing framework allows for effective task-wise reparameterization by tuning only these coefficients while keeping templates frozen.A key innovation of RECAST is the novel weight reconstruction pipeline called Neural Mimicry, which eliminates the need for pretraining from scratch. This allows for high-fidelity emulation of existing pretrained weights within our framework and provides quick adaptability to any model scale and architecture. Extensive experiments across six datasets demonstrate RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces Moreover, we show that RECAST's architecture-agnostic nature allows for seamless integration with existing methods, further boosting performance.

RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks

TL;DR

RECAST introduces a drastically parameter-efficient approach to incremental learning by reparameterizing pretrained weights through shared template banks and module-specific coefficients, and a Neural Mimicry pipeline to reconstruct those weights without extensive pretraining. The framework achieves per-task updates with fewer than 50 trainable parameters and is architecture-agnostic, demonstrating consistent improvements over state-of-the-art across CNN and ViT backbones on six datasets. Key contributions include the Weight Decomposition mechanism, the Neural Mimicry training objective, and a scalable Parametric Scaling Strategy that yields substantial memory savings while preserving expressiveness. The work shows that combining template diversity with minimal coefficient updates can robustly adapt to sequential tasks and can complement existing adapters for additional gains.

Abstract

Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the entire network, they still require tens to hundreds of thousands of task-specific trainable parameters even for relatively small networks, making it challenging to operate on resource-constrained environments with high communication costs like edge devices or mobile phones. Thus, we propose Reparameterized, Compact weight Adaptation for Sequential Tasks (RECAST), a novel method that dramatically reduces task-specific trainable parameters to fewer than 50 - several orders of magnitude less than competing methods like LoRA. RECAST accomplishes this efficiency by learning to decompose layer weights into a soft parameter-sharing framework consisting of shared weight templates and very few module-specific scaling factors or coefficients. This soft parameter-sharing framework allows for effective task-wise reparameterization by tuning only these coefficients while keeping templates frozen.A key innovation of RECAST is the novel weight reconstruction pipeline called Neural Mimicry, which eliminates the need for pretraining from scratch. This allows for high-fidelity emulation of existing pretrained weights within our framework and provides quick adaptability to any model scale and architecture. Extensive experiments across six datasets demonstrate RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces Moreover, we show that RECAST's architecture-agnostic nature allows for seamless integration with existing methods, further boosting performance.

Paper Structure

This paper contains 27 sections, 10 equations, 10 figures, 6 tables, 2 algorithms.

Figures (10)

  • Figure 1: (a) Existing IL methods, i.e. Rehearsal, Regularization, Reconfiguration - exhibit various limitations in terms of model complexity, memory requirements, and training overheads. In comparison our proposed method, (b) RECAST can be uses as a frozen backbone, allowing efficient reparameterization of any target module with a negligible number of parameter updates (order of $10^{-6}$) and can accommodate any number of disjoint tasks.
  • Figure 2: RECAST decomposes module weights into templates and coefficients, which are learned through reconstruction from pretrained weights (Section \ref{['subsec: neural-mimic']}). These components are linearly combined to dynamically generate weights for a target layer (Section \ref{['subsec: method-math']}). During TIL training, the templates are kept frozen, and only task-specific coefficients learned to create new layer weights.
  • Figure 3: Averaged best Top-1 accuracy across six datasets for ViT models of varying scales, comparing Baseline, RECAST, and RECAST with Coefficient fine-tuning (FT) approaches.
  • Figure 4: Comparing averaged best Top-1 accuracy across six datasets for ViT-Small across various model configurations. We find RECAST excels in ultra-low parameter ranges (24-96) where LoRA struggles. RECAST w/ LoRA improves performance across all parameter ranges, offering complementary advantages.
  • Figure 5: Plot showing average classification accuracy of models reconstructed in different ways. VAE-reconstructed models performed slightly worse than the rest, with Smooth L1 loss providing the best performance
  • ...and 5 more figures