Multitask Active Learning for Graph Anomaly Detection
Wenjing Chang, Kay Liu, Kaize Ding, Philip S. Yu, Jianjun Yu
TL;DR
This work tackles graph anomaly detection under limited supervision by introducing MITIGATE, a multitask active learning framework that leverages node classification as auxiliary supervision to detect anomalies and actively query informative nodes. A shared GCN encoder drives two decoders—one for classification and one for anomaly scoring—and a hybrid score combines their signals. The node-selection strategy blends distance-based clustering with cross-task confidence differences, using a masked aggregation to ensure representativeness and diversity. Empirical results on four datasets show MITIGATE outperforms state-of-the-art baselines, especially under small labeling budgets, and ablations confirm the importance of uncertainty loss, the confidence-difference informativeness, and masked-distance features. This approach provides a scalable, label-efficient pathway for robust graph anomaly detection in security-sensitive web contexts, with publicly available code for reproducibility.
Abstract
In the web era, graph machine learning has been widely used on ubiquitous graph-structured data. As a pivotal component for bolstering web security and enhancing the robustness of graph-based applications, the significance of graph anomaly detection is continually increasing. While Graph Neural Networks (GNNs) have demonstrated efficacy in supervised and semi-supervised graph anomaly detection, their performance is contingent upon the availability of sufficient ground truth labels. The labor-intensive nature of identifying anomalies from complex graph structures poses a significant challenge in real-world applications. Despite that, the indirect supervision signals from other tasks (e.g., node classification) are relatively abundant. In this paper, we propose a novel MultItask acTIve Graph Anomaly deTEction framework, namely MITIGATE. Firstly, by coupling node classification tasks, MITIGATE obtains the capability to detect out-of-distribution nodes without known anomalies. Secondly, MITIGATE quantifies the informativeness of nodes by the confidence difference across tasks, allowing samples with conflicting predictions to provide informative yet not excessively challenging information for subsequent training. Finally, to enhance the likelihood of selecting representative nodes that are distant from known patterns, MITIGATE adopts a masked aggregation mechanism for distance measurement, considering both inherent features of nodes and current labeled status. Empirical studies on four datasets demonstrate that MITIGATE significantly outperforms the state-of-the-art methods for anomaly detection. Our code is publicly available at: https://github.com/AhaChang/MITIGATE.
