Develop End-to-End Anomaly Detection System
Emanuele Mengoli, Zhiyuan Yao, Wutao Wei
TL;DR
The paper addresses anomaly detection in networks under high data volatility and scarce labeled data by proposing an end-to-end development pipeline that continuously integrates user feedback. It introduces Lachesis, a particle-filter–based forecasting system for MAC-flap events, implemented in two versions (v0 and v1) and evaluated against standard baselines on real-world network data. The key contributions include a modular MLOps-like pipeline, two Lachesis variants tailored for forecasting and alerting, and a comprehensive evaluation across stationary, periodic, and volatile node cohorts demonstrating improved forecasting accuracy and reduced alert noise. Practically, the framework enables scalable, user-centered refinement of data-driven network anomaly products throughout their life cycle, enhancing robustness and operational efficiency.
Abstract
Anomaly detection plays a crucial role in ensuring network robustness. However, implementing intelligent alerting systems becomes a challenge when considering scenarios in which anomalies can be caused by both malicious and non-malicious events, leading to the difficulty of determining anomaly patterns. The lack of labeled data in the computer networking domain further exacerbates this issue, impeding the development of robust models capable of handling real-world scenarios. To address this challenge, in this paper, we propose an end-to-end anomaly detection model development pipeline. This framework makes it possible to consume user feedback and enable continuous user-centric model performance evaluation and optimization. We demonstrate the efficacy of the framework by way of introducing and bench-marking a new forecasting model -- named \emph{Lachesis} -- on a real-world networking problem. Experiments have demonstrated the robustness and effectiveness of the two proposed versions of \emph{Lachesis} compared with other models proposed in the literature. Our findings underscore the potential for improving the performance of data-driven products over their life cycles through a harmonized integration of user feedback and iterative development.
