A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation
Gaoxiang Zhao, Lu Wang, Xiaoqiang Wang
TL;DR
This work addresses anomaly detection under model uncertainty and extreme class imbalance by proposing a Mallows-like focal loss (MFL) criterion to optimize ensemble weights $\boldsymbol{\omega}$ on the model-averaging simplex. It integrates the MFL criterion into Random Forest, training $M$ base trees and solving for $\boldsymbol{\omega}^*$ to form a weighted ensemble, with a complexity penalty reflecting base-model size $k_m$ and focal-loss parameters $(\alpha,\gamma)$. Empirical results on the KDDCup network intrusion dataset and ten imbalanced UCI datasets show that MFL-based RF improves AUC, ARI, and Recall relative to cross-entropy model averaging and several standard anomaly detectors, indicating improved accuracy and robustness. The approach advances anomaly detection by explicitly addressing data imbalance and model uncertainty through a principled, regularized, model-averaging framework and offers practical gains for cybersecurity and other domains.
Abstract
The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further integrated into Random Forest algorithm by replacing the conventional voting method. We have evaluated the proposed method on benchmark datasets across various domains, including network intrusion. The findings indicate that our proposed method not only surpasses the model averaging with typical loss functions but also outstrips common anomaly detection algorithms in terms of accuracy and robustness.
