Table of Contents
Fetching ...

CyberSentinel: Efficient Anomaly Detection in Programmable Switch using Knowledge Distillation

Sankalp Mittal

TL;DR

CyberSentinel addresses the challenge of scalable, unseen-attack anomaly detection in IoT-heavy traffic by transferring knowledge from an ensemble of autoencoders into a lightweight iForest that can be deployed entirely in the programmable switch data plane. The approach uses burst-level features and a novel knowledge-distillation scheme to produce a small set of whitelist rules that enable line-rate detection on real hardware, while control-plane components handle offline model preparation and online rule updates. Empirical results show CyberSentinel achieves detection performance comparable to control-plane–augmented methods while delivering 66.47% higher throughput and 50% lower per-packet latency on a 40 Gbps link, and it demonstrates robustness to adversarial attacks. The work delivers a practical, unsupervised, data-plane–friendly solution for real-time anomaly detection at IoT-scale, with broad implications for scalable network security in programmable switches.

Abstract

The increasing volume of traffic (especially from IoT devices) is posing a challenge to the current anomaly detection systems. Existing systems are forced to take the support of the control plane for a more thorough and accurate detection of malicious traffic (anomalies). This introduces latency in making decisions regarding fast incoming traffic and therefore, existing systems are unable to scale to such growing rates of traffic. In this paper, we propose CyberSentinel, a high throughput and accurate anomaly detection system deployed entirely in the programmable switch data plane; making it the first work to accurately detect anomalies at line speed. To detect unseen network attacks, CyberSentinel uses a novel knowledge distillation scheme that incorporates "learned" knowledge of deep unsupervised ML models (\textit{e.g.}, autoencoders) to develop an iForest model that is then installed in the data plane in the form of whitelist rules. We implement a prototype of CyberSentinel on a testbed with an Intel Tofino switch and evaluate it on various real-world use cases. CyberSentinel yields similar detection performance compared to the state-of-the-art control plane solutions but with an increase in packet-processing throughput by $66.47\%$ on a $40$ Gbps link, and a reduction in average per-packet latency by $50\%$.

CyberSentinel: Efficient Anomaly Detection in Programmable Switch using Knowledge Distillation

TL;DR

CyberSentinel addresses the challenge of scalable, unseen-attack anomaly detection in IoT-heavy traffic by transferring knowledge from an ensemble of autoencoders into a lightweight iForest that can be deployed entirely in the programmable switch data plane. The approach uses burst-level features and a novel knowledge-distillation scheme to produce a small set of whitelist rules that enable line-rate detection on real hardware, while control-plane components handle offline model preparation and online rule updates. Empirical results show CyberSentinel achieves detection performance comparable to control-plane–augmented methods while delivering 66.47% higher throughput and 50% lower per-packet latency on a 40 Gbps link, and it demonstrates robustness to adversarial attacks. The work delivers a practical, unsupervised, data-plane–friendly solution for real-time anomaly detection at IoT-scale, with broad implications for scalable network security in programmable switches.

Abstract

The increasing volume of traffic (especially from IoT devices) is posing a challenge to the current anomaly detection systems. Existing systems are forced to take the support of the control plane for a more thorough and accurate detection of malicious traffic (anomalies). This introduces latency in making decisions regarding fast incoming traffic and therefore, existing systems are unable to scale to such growing rates of traffic. In this paper, we propose CyberSentinel, a high throughput and accurate anomaly detection system deployed entirely in the programmable switch data plane; making it the first work to accurately detect anomalies at line speed. To detect unseen network attacks, CyberSentinel uses a novel knowledge distillation scheme that incorporates "learned" knowledge of deep unsupervised ML models (\textit{e.g.}, autoencoders) to develop an iForest model that is then installed in the data plane in the form of whitelist rules. We implement a prototype of CyberSentinel on a testbed with an Intel Tofino switch and evaluate it on various real-world use cases. CyberSentinel yields similar detection performance compared to the state-of-the-art control plane solutions but with an increase in packet-processing throughput by on a Gbps link, and a reduction in average per-packet latency by .

Paper Structure

This paper contains 55 sections, 30 equations, 18 figures, 6 tables, 4 algorithms.

Figures (18)

  • Figure 1: Protocol Independent Switch Architecture (PISA).
  • Figure 2: Overview of CyberSentinel
  • Figure 3: Knowledge distillation of autoencoders into iForest and whitelist rules generation. (1) We first train an ensemble of autoencoders and (2) collect reconstruction errors by feeding each training sample in the trained ensemble. (3) We then train iForest model itself. We then (4) map each leaf node of trained iForest's iTrees with the respective training samples. (5) Next, we embed each leaf node with expected reconstruction errors (over mapped training samples and augmented samples) according to Eq 2. (6) We convert combination reconstruction errors into a label (0 or 1) based on Eq 3. (7) Meanwhile we derive iTree hypercubes from trained iForest by taking the cartesian product of all possible feature boundaries for each iTree 290987. (8) We then merge iTree hypercubes into an iForest hypercubes. (9) We label each hypercube of iForest hypercubes by consulting labeled iForest as per Eq 4. (10) Once labeled, we merge adjacent hypercubes having same label. (11) Lastly we derive whitelist rules for hypercubes whose $label = 0$.
  • Figure 4: Effect of various hyperparameters on consistency of knowledge distillation algorithm.
  • Figure 5: TPR and TNR comparison of iForest, Magnifier 290987 and Magnifier-distilled iForest. Distilled iForest retains high TPR of iForest as well as high TNR from Magnifier.
  • ...and 13 more figures