Table of Contents
Fetching ...

Retrofit: Continual Learning with Bounded Forgetting for Security Applications

Yiling He, Junchi Lei, Hongyu She, Shuo Shao, Xinran Zheng, Yiping Liu, Zhan Qin, Lorenzo Cavallaro

TL;DR

This work addresses continual learning in data-sensitive security contexts where historical data cannot be retained. It introduces Retrofit, a data-retrospective-free CL method that consolidates old and new knowledge through parameter-level merging, enforcing bounded forgetting via low-rank updates and confidence-driven sparse arbitration. Practically, Retrofit demonstrates superior retention and adaptability on two security-relevant tasks: malware detection under temporal drift and binary analysis under representation shift, outperforming standard CL baselines and, in several cases, the oracle full-data retraining. The approach offers a scalable, privacy-preserving pathway for trustworthy continual learning in security, with potential extensions to active learning and cross-domain code analysis. Incorporating mathematical formalism, Retrofit relies on updates of the form $\Delta W_t = A_t (M_t \odot B_t)$ and cumulative weight $W_T = W_0 + \sum_{t=1}^T \Delta W_t$, while bounding forgetting via $E_t = \|M_t \odot B_t\|_F$ and confidence-guided arbitration losses that balance stability and plasticity.

Abstract

Modern security analytics are increasingly powered by deep learning models, but their performance often degrades as threat landscapes evolve and data representations shift. While continual learning (CL) offers a promising paradigm to maintain model effectiveness, many approaches rely on full retraining or data replay, which are infeasible in data-sensitive environments. Moreover, existing methods remain inadequate for security-critical scenarios, facing two coupled challenges in knowledge transfer: preserving prior knowledge without old data and integrating new knowledge with minimal interference. We propose RETROFIT, a data retrospective-free continual learning method that achieves bounded forgetting for effective knowledge transfer. Our key idea is to consolidate previously trained and newly fine-tuned models, serving as teachers of old and new knowledge, through parameter-level merging that eliminates the need for historical data. To mitigate interference, we apply low-rank and sparse updates that confine parameter changes to independent subspaces, while a knowledge arbitration dynamically balances the teacher contributions guided by model confidence. Our evaluation on two representative applications demonstrates that RETROFIT consistently mitigates forgetting while maintaining adaptability. In malware detection under temporal drift, it substantially improves the retention score, from 20.2% to 38.6% over CL baselines, and exceeds the oracle upper bound on new data. In binary summarization across decompilation levels, where analyzing stripped binaries is especially challenging, RETROFIT achieves around twice the BLEU score of transfer learning used in prior work and surpasses all baselines in cross-representation generalization.

Retrofit: Continual Learning with Bounded Forgetting for Security Applications

TL;DR

This work addresses continual learning in data-sensitive security contexts where historical data cannot be retained. It introduces Retrofit, a data-retrospective-free CL method that consolidates old and new knowledge through parameter-level merging, enforcing bounded forgetting via low-rank updates and confidence-driven sparse arbitration. Practically, Retrofit demonstrates superior retention and adaptability on two security-relevant tasks: malware detection under temporal drift and binary analysis under representation shift, outperforming standard CL baselines and, in several cases, the oracle full-data retraining. The approach offers a scalable, privacy-preserving pathway for trustworthy continual learning in security, with potential extensions to active learning and cross-domain code analysis. Incorporating mathematical formalism, Retrofit relies on updates of the form and cumulative weight , while bounding forgetting via and confidence-guided arbitration losses that balance stability and plasticity.

Abstract

Modern security analytics are increasingly powered by deep learning models, but their performance often degrades as threat landscapes evolve and data representations shift. While continual learning (CL) offers a promising paradigm to maintain model effectiveness, many approaches rely on full retraining or data replay, which are infeasible in data-sensitive environments. Moreover, existing methods remain inadequate for security-critical scenarios, facing two coupled challenges in knowledge transfer: preserving prior knowledge without old data and integrating new knowledge with minimal interference. We propose RETROFIT, a data retrospective-free continual learning method that achieves bounded forgetting for effective knowledge transfer. Our key idea is to consolidate previously trained and newly fine-tuned models, serving as teachers of old and new knowledge, through parameter-level merging that eliminates the need for historical data. To mitigate interference, we apply low-rank and sparse updates that confine parameter changes to independent subspaces, while a knowledge arbitration dynamically balances the teacher contributions guided by model confidence. Our evaluation on two representative applications demonstrates that RETROFIT consistently mitigates forgetting while maintaining adaptability. In malware detection under temporal drift, it substantially improves the retention score, from 20.2% to 38.6% over CL baselines, and exceeds the oracle upper bound on new data. In binary summarization across decompilation levels, where analyzing stripped binaries is especially challenging, RETROFIT achieves around twice the BLEU score of transfer learning used in prior work and surpasses all baselines in cross-representation generalization.

Paper Structure

This paper contains 30 sections, 4 theorems, 14 equations, 9 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Let $g_{\text{old}} = \nabla_\theta \mathcal{L}_{\text{old}}(\theta_{\text{prev}})$ denote the gradient of a past task at the current model. For any old-task gradient $g_{\text{old}}$ and update $\Delta W_t = A_tB_t$, Thus, the expected cosine similarity between new updates and old-task gradients scales with $r/D$, making destructive interference less likely as long as $r \ll D$.

Figures (9)

  • Figure 1: Examples of the benefits and insufficiencies of CL in security applications. (a) Malware detection: continual fine-tuning mitigates temporal degradation compared to a static model trained on early data, but suffers from severe catastrophic forgetting; (b) Binary analysis: continual fine-tuning across abstraction levels outperforms transfer learning baselines in the most security-critical task, but effective knowledge accumulation and cross-representation robustness remain challenging.
  • Figure 2: Continual learning for addressing temporal shift (left) and representation shift (right) in security applications.
  • Figure 3: Design insights for accumulating old and new knowledge at each continual learning stage. Knowledge accumulation is achieved through parameter merging between the previous and newly adapted models, while interference control is ensured by applying masked low-rank updates.
  • Figure 4: Cumulative CL comparison in malware detection.
  • Figure 5:
  • ...and 4 more figures

Theorems & Definitions (4)

  • Proposition 1: Low-rank reduces expected interference by $r/D$
  • Proposition 2: Frozen random $A$ as near-Isometry
  • Proposition 3: Model-level expected interference reduced by low-rank
  • Theorem 1: Model-level reduced interference & bounded forgetting