A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

Gaoxiang Zhao; Lu Wang; Xiaoqiang Wang

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

Gaoxiang Zhao, Lu Wang, Xiaoqiang Wang

TL;DR

This work addresses anomaly detection under model uncertainty and extreme class imbalance by proposing a Mallows-like focal loss (MFL) criterion to optimize ensemble weights $\boldsymbol{\omega}$ on the model-averaging simplex. It integrates the MFL criterion into Random Forest, training $M$ base trees and solving for $\boldsymbol{\omega}^*$ to form a weighted ensemble, with a complexity penalty reflecting base-model size $k_m$ and focal-loss parameters $(\alpha,\gamma)$. Empirical results on the KDDCup network intrusion dataset and ten imbalanced UCI datasets show that MFL-based RF improves AUC, ARI, and Recall relative to cross-entropy model averaging and several standard anomaly detectors, indicating improved accuracy and robustness. The approach advances anomaly detection by explicitly addressing data imbalance and model uncertainty through a principled, regularized, model-averaging framework and offers practical gains for cybersecurity and other domains.

Abstract

The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further integrated into Random Forest algorithm by replacing the conventional voting method. We have evaluated the proposed method on benchmark datasets across various domains, including network intrusion. The findings indicate that our proposed method not only surpasses the model averaging with typical loss functions but also outstrips common anomaly detection algorithms in terms of accuracy and robustness.

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

TL;DR

This work addresses anomaly detection under model uncertainty and extreme class imbalance by proposing a Mallows-like focal loss (MFL) criterion to optimize ensemble weights

on the model-averaging simplex. It integrates the MFL criterion into Random Forest, training

base trees and solving for

to form a weighted ensemble, with a complexity penalty reflecting base-model size

and focal-loss parameters

. Empirical results on the KDDCup network intrusion dataset and ten imbalanced UCI datasets show that MFL-based RF improves AUC, ARI, and Recall relative to cross-entropy model averaging and several standard anomaly detectors, indicating improved accuracy and robustness. The approach advances anomaly detection by explicitly addressing data imbalance and model uncertainty through a principled, regularized, model-averaging framework and offers practical gains for cybersecurity and other domains.

Abstract

Paper Structure (6 sections, 7 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 6 sections, 7 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
PROPOSED METHOD
Mallows-like Focal Loss Criterion
Random Forest with MFL
EXPERIMENTS
Discussion

Figures (2)

Figure 1: The schematic diagram of the proposed model averaging method is presented. This approach minimizes MFL criterion to allocate weights to base decision trees, mitigating the effects of data imbalance while controlling model complexity.
Figure 2: Network intrusion dataset methodology metrics.

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

TL;DR

Abstract

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)