Table of Contents
Fetching ...

Lifelong Continual Learning for Anomaly Detection: New Challenges, Perspectives, and Insights

Kamil Faber, Roberto Corizzo, Bartlomiej Sniezynski, Nathalie Japkowicz

TL;DR

The paper defines lifelong anomaly detection as the integration of continual adaptation with knowledge retention for anomaly detectors operating in evolving environments. It formalizes a scenario-generation and evaluation framework, including Lifelong ROC-AUC, backward transfer (BWT), and forward transfer (FWT), to systematically benchmark detectors under lifelong conditions. Experiments across multiple datasets reveal a clear gap between traditional non-lifelong anomaly detectors and lifelong approaches, with replay-based strategies mitigating forgetting and improving transfer, though upper-bound multi-task experts (MSTE) often still outperform them. The work demonstrates the practical value of lifelong learning for anomaly detection in domains such as cybersecurity and cyber-physical systems, and provides open-source tools to spur broader adoption and development.

Abstract

Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, which provides intrinsically different challenges compared to the more widely explored classification setting. In this paper, we face this issue by exploring, motivating, and discussing lifelong anomaly detection, trying to build foundations for its wider adoption. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Second, we characterize learning settings and a scenario generation procedure that enables researchers to experiment with lifelong anomaly detection using existing datasets. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning. Overall, we conclude that the adoption of lifelong anomaly detection is important to design more robust models that provide a comprehensive view of the environment, as well as simultaneous adaptation and knowledge retention.

Lifelong Continual Learning for Anomaly Detection: New Challenges, Perspectives, and Insights

TL;DR

The paper defines lifelong anomaly detection as the integration of continual adaptation with knowledge retention for anomaly detectors operating in evolving environments. It formalizes a scenario-generation and evaluation framework, including Lifelong ROC-AUC, backward transfer (BWT), and forward transfer (FWT), to systematically benchmark detectors under lifelong conditions. Experiments across multiple datasets reveal a clear gap between traditional non-lifelong anomaly detectors and lifelong approaches, with replay-based strategies mitigating forgetting and improving transfer, though upper-bound multi-task experts (MSTE) often still outperform them. The work demonstrates the practical value of lifelong learning for anomaly detection in domains such as cybersecurity and cyber-physical systems, and provides open-source tools to spur broader adoption and development.

Abstract

Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, which provides intrinsically different challenges compared to the more widely explored classification setting. In this paper, we face this issue by exploring, motivating, and discussing lifelong anomaly detection, trying to build foundations for its wider adoption. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Second, we characterize learning settings and a scenario generation procedure that enables researchers to experiment with lifelong anomaly detection using existing datasets. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning. Overall, we conclude that the adoption of lifelong anomaly detection is important to design more robust models that provide a comprehensive view of the environment, as well as simultaneous adaptation and knowledge retention.
Paper Structure (18 sections, 3 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 18 sections, 3 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: General view of lifelong learning in terms of challenges (see Section \ref{['sec:lifelong_anomaly_detection']}), perspectives (see Section \ref{['sec:scenarios']}), and insights (see Section \ref{['sec:discussion']}) examined in our paper.
  • Figure 2: A scenario with four recurring tasks $(T_1, T_2, T_3, T_4)$. Conventional anomaly detection requires constant model updates and results in detection delays. Lifelong learning mitigates this burden by retaining knowledge of tasks.
  • Figure 3: Comparison of training/update and inference for non-lifelong and lifelong anomaly detection in the scenario with four tasks $(T_1, T_2, T_3, T_4)$. In non-lifelong anomaly detection, the model forgets the previous tasks as soon as a new task is learned (left -- top). In contrast, the lifelong anomaly detection model aims to retain knowledge of all tasks (left -- bottom). This characteristic has a serious impact on the model's behavior during inference (right). In non-lifelong anomaly detection, after learning task $T_4$, the model misclassifies data from previous tasks as anomalous since it considers only data from the current task as normal behavior (right -- top). On the other hand, the ideal lifelong anomaly detection model retains the knowledge of all tasks, preventing the misclassification of normal data from previous tasks as anomalous (right -- bottom). This difference in behavior between non-lifelong and lifelong anomaly detection may lead to a discrepancy in their performance scores.
  • Figure 4: Lifelong evaluation protocol. The model handles a sequence of concepts $i = {1, 2, 3}$. For each concept $i$, the model is trained on training set $T_i$ (learning phase). After each learning phase, the evaluation phase is triggered, where the model anomaly detection performance (in terms of ROC-AUC) is evaluated on all testing sets $E_j$ from all concepts (previous, current, and future). The evaluation protocol creates a matrix $R$, in which the entry $R_{i, j}$ represents model performance in terms of ROC-AUC on concept $j$ after learning concept $i$. This matrix is used to compute final metric values, such as Lifelong ROC-AUC, BWT, and FWT.
  • Figure 5: Lifelong scenarios variants based on different choices of concept creation functions $\gamma$ and $\lambda$: i) clustered anomaly concepts assigned to the closest normal concept (CC), ii) clustered anomaly concepts assigned randomly to normal concepts (CR), and iii) anomalies randomly assigned to normal concepts (R).
  • ...and 4 more figures