Table of Contents
Fetching ...

A Grid Based Adversarial Clustering Algorithm

Wutao Wei, Nikhil Gupta, Bowei Xi

TL;DR

This paper develops a novel grid based adversarial clustering algorithm that is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas.

Abstract

Nowadays more and more data are gathered for detecting and preventing cyber attacks. In cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. Most of the previous work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in practice, labeling the data instances often requires costly and time-consuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries' behavior. To address the above mentioned challenges, in this paper, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas. Our algorithm also identifies sub-clusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies.

A Grid Based Adversarial Clustering Algorithm

TL;DR

This paper develops a novel grid based adversarial clustering algorithm that is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas.

Abstract

Nowadays more and more data are gathered for detecting and preventing cyber attacks. In cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. Most of the previous work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in practice, labeling the data instances often requires costly and time-consuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries' behavior. To address the above mentioned challenges, in this paper, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas. Our algorithm also identifies sub-clusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies.

Paper Structure

This paper contains 8 sections, 1 equation, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: Left to right: 1) Simulation 1 true labels; 2) Simulation 2 true labels; 3) Simulation 3 true labels;
  • Figure 2: Simulation 1 comparison with $\alpha=0.6$. Left to right: 1) ADClust with $k=10$; 2) ADClust with $k=20$; 3) EM least square; 4) S4VM
  • Figure 3: Simulation 2 comparison with $\alpha=0.6$. Left to right: 1) ADClust with $k=10$; 2) ADClust with $k=20$; 3) EM least square; 4) S4VM
  • Figure 4: Simulation 3 comparison with $\alpha=0.6$. Left to right: 1) ADClust with $k=10$; 2) ADClust with $k=20$; 3) EM least square; 4) S4VM
  • Figure 5: Quantitative measures as the weight $k$ increases from 1 to 100. Left Column: Top panel is percent of abnormal points in mixed region and bottom panel is percent of abnormal points among outliers; Right Column: Top panel is the number of points in mixed region and bottom panel is the number of points as outliers.
  • ...and 1 more figures