Table of Contents
Fetching ...

LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

Xiangrui Cai, Yang Wang, Sihan Xu, Hao Li, Ying Zhang, Zheli Liu, Xiaojie Yuan

TL;DR

This work tackles real-time insider threat detection at the activity level, addressing the coarse granularity and post-hoc limitation of prior ITD approaches. It proposes LAN, a three-component framework that jointly models temporal activity sequences, learns adaptive activity graphs, and predicts anomaly scores using a graph neural network, all guided by a novel hybrid loss to handle extreme class imbalance. LAN demonstrates consistent improvements over nine baselines for real-time ITD and eight baselines for post-hoc ITD on CERT r4.2 and r5.2, with notable reductions in false positives and low-latency inference. The framework’s use of graph structure learning to avoid manual graph construction, together with its balance-aware loss, supports scalable, fine-grained insider-threat detection suitable for practical deployment.

Abstract

Enterprises and organizations are faced with potential threats from insider employees that may lead to serious consequences. Previous studies on insider threat detection (ITD) mainly focus on detecting abnormal users or abnormal time periods (e.g., a week or a day). However, a user may have hundreds of thousands of activities in the log, and even within a day there may exist thousands of activities for a user, requiring a high investigation budget to verify abnormal users or activities given the detection results. On the other hand, existing works are mainly post-hoc methods rather than real-time detection, which can not report insider threats in time before they cause loss. In this paper, we conduct the first study towards real-time ITD at activity level, and present a fine-grained and efficient framework LAN. Specifically, LAN simultaneously learns the temporal dependencies within an activity sequence and the relationships between activities across sequences with graph structure learning. Moreover, to mitigate the data imbalance problem in ITD, we propose a novel hybrid prediction loss, which integrates self-supervision signals from normal activities and supervision signals from abnormal activities into a unified loss for anomaly detection. We evaluate the performance of LAN on two widely used datasets, i.e., CERT r4.2 and CERT r5.2. Extensive and comparative experiments demonstrate the superiority of LAN, outperforming 9 state-of-the-art baselines by at least 9.92% and 6.35% in AUC for real-time ITD on CERT r4.2 and r5.2, respectively. Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7.70% and 4.03% in AUC on two datasets. Finally, the ablation study, parameter analysis, and compatibility analysis evaluate the impact of each module and hyper-parameter in LAN. The source code can be obtained from https://github.com/Li1Neo/LAN.

LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

TL;DR

This work tackles real-time insider threat detection at the activity level, addressing the coarse granularity and post-hoc limitation of prior ITD approaches. It proposes LAN, a three-component framework that jointly models temporal activity sequences, learns adaptive activity graphs, and predicts anomaly scores using a graph neural network, all guided by a novel hybrid loss to handle extreme class imbalance. LAN demonstrates consistent improvements over nine baselines for real-time ITD and eight baselines for post-hoc ITD on CERT r4.2 and r5.2, with notable reductions in false positives and low-latency inference. The framework’s use of graph structure learning to avoid manual graph construction, together with its balance-aware loss, supports scalable, fine-grained insider-threat detection suitable for practical deployment.

Abstract

Enterprises and organizations are faced with potential threats from insider employees that may lead to serious consequences. Previous studies on insider threat detection (ITD) mainly focus on detecting abnormal users or abnormal time periods (e.g., a week or a day). However, a user may have hundreds of thousands of activities in the log, and even within a day there may exist thousands of activities for a user, requiring a high investigation budget to verify abnormal users or activities given the detection results. On the other hand, existing works are mainly post-hoc methods rather than real-time detection, which can not report insider threats in time before they cause loss. In this paper, we conduct the first study towards real-time ITD at activity level, and present a fine-grained and efficient framework LAN. Specifically, LAN simultaneously learns the temporal dependencies within an activity sequence and the relationships between activities across sequences with graph structure learning. Moreover, to mitigate the data imbalance problem in ITD, we propose a novel hybrid prediction loss, which integrates self-supervision signals from normal activities and supervision signals from abnormal activities into a unified loss for anomaly detection. We evaluate the performance of LAN on two widely used datasets, i.e., CERT r4.2 and CERT r5.2. Extensive and comparative experiments demonstrate the superiority of LAN, outperforming 9 state-of-the-art baselines by at least 9.92% and 6.35% in AUC for real-time ITD on CERT r4.2 and r5.2, respectively. Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7.70% and 4.03% in AUC on two datasets. Finally, the ablation study, parameter analysis, and compatibility analysis evaluate the impact of each module and hyper-parameter in LAN. The source code can be obtained from https://github.com/Li1Neo/LAN.
Paper Structure (32 sections, 21 equations, 6 figures, 5 tables)

This paper contains 32 sections, 21 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a) Illustration of Activity-level ITD. Activity-level ITD aims at discovering abnormal activities inside the system (shown in blue boxes). (b) Post-hoc ITD. It is usually deployed for retrospective detection, discovering abnormal activities in a past period. (c) Real-time ITD. It is usually applied to detect abnormality of a current activity.
  • Figure 2: Overall architecture of LAN.
  • Figure 3: ROC curves for real-time ITD and post-hoc ITD on two datasets
  • Figure 4: Performance of LAN with different numbers of candidate neighbors $k$ obtained through retrieval
  • Figure 5: Performance of LAN with different weights of negative feedback $r$ in the hybrid prediction loss
  • ...and 1 more figures