FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

R G Gayathri; Atul Sajjanhar; Md Palash Uddin; Yong Xiang

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

R G Gayathri, Atul Sajjanhar, Md Palash Uddin, Yong Xiang

TL;DR

This work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization and proposes a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients.

Abstract

Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme.

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

TL;DR

Abstract

Paper Structure (22 sections, 6 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Preliminaries
Insider Threat Detection in Distributed Setting
Federated Learning
Local Model Update in FL
Global Model Aggregation in FL
Adversarial Training
Motivation
Proposed Approach
Approach Overview
Distributed Feature Space Generation
FedAT
Federated Data Augmentation using AT
Multiclass Classification using SNN-MLP
Experiment and Result Analysis
...and 7 more sections

Figures (7)

Figure 1: FedAT-enabled ITD framework. The first step is feature space generation which performs the feature extraction and data preparation. The next step is the FedAT in the local clients, followed by model aggregation in the server.
Figure 2: Non-IID insider threat data generation. Each client performs the feature space generation independently and creates a context-based feature vector using the log files available at the respective site. Classes are labeled after the scenarios that result in an insider threat.
Figure 3: FL with local GAN augmentation for ITD. As the first step in FedAT, the GAN training is employed by the G model and the C model. Later, the C model is used for multiclass classification. Finally, the gradients from the C model and the auxiliary information (the class labels) are shared with the server.
Figure 4: Network architectures of the SNN-MLP vs the classical MLP. (a) depicts the proposed SNN-MLP model, where SELU activation and AlphaDropout are adopted. (b) illustrates the adoption of the ReLU activation and the standard Dropout in the classical MLP model.
Figure 5: Non-IID data distribution for the CERT datasets.
...and 2 more figures

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

TL;DR

Abstract

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)