Table of Contents
Fetching ...

Function Based Isolation Forest (FuBIF): A Unifying Framework for Interpretable Isolation-Based Anomaly Detection

Alessio Arcudi, Alessandro Ferreri, Francesco Borsatti, Gian Antonio Susto

TL;DR

FuBIF introduces a unifying, function-based framework for isolation-based anomaly detection, enabling flexible, interpretable space partitioning via splitting functions $f\in\mathcal{F}$ with thresholds drawn from $\mu$ and measured by path-length-based anomaly scores. Its companion FuBIFFI provides model-agnostic feature attributions by tracing per-node contributions through forest paths and aggregating into a global $GFI$ score that contrasts inliers and outliers. The approach generalizes classic IF variants (e.g., axis-aligned, linear, radial, NN-based) as special cases $(\mathcal{F},\rho,\mu)$, and demonstrates competitive performance across real and synthetic AD datasets, while offering interpretable explanations. While results are dataset-dependent and reveal biases inherent to certain geometries, FuBIF lays a foundation for adaptive, ensemble, and bias-aware anomaly detection with reproducible, open-source implementations. Future work targets bias mitigation, adaptive function selection, and ensemble strategies to further enhance robustness and interpretability in complex data landscapes.

Abstract

Anomaly Detection (AD) is evolving through algorithms capable of identifying outliers in complex datasets. The Isolation Forest (IF), a pivotal AD technique, exhibits adaptability limitations and biases. This paper introduces the Function-based Isolation Forest (FuBIF), a generalization of IF that enables the use of real-valued functions for dataset branching, significantly enhancing the flexibility of evaluation tree construction. Complementing this, the FuBIF Feature Importance (FuBIFFI) algorithm extends the interpretability in IF-based approaches by providing feature importance scores across possible FuBIF models. This paper details the operational framework of FuBIF, evaluates its performance against established methods, and explores its theoretical contributions. An open-source implementation is provided to encourage further research and ensure reproducibility.

Function Based Isolation Forest (FuBIF): A Unifying Framework for Interpretable Isolation-Based Anomaly Detection

TL;DR

FuBIF introduces a unifying, function-based framework for isolation-based anomaly detection, enabling flexible, interpretable space partitioning via splitting functions with thresholds drawn from and measured by path-length-based anomaly scores. Its companion FuBIFFI provides model-agnostic feature attributions by tracing per-node contributions through forest paths and aggregating into a global score that contrasts inliers and outliers. The approach generalizes classic IF variants (e.g., axis-aligned, linear, radial, NN-based) as special cases , and demonstrates competitive performance across real and synthetic AD datasets, while offering interpretable explanations. While results are dataset-dependent and reveal biases inherent to certain geometries, FuBIF lays a foundation for adaptive, ensemble, and bias-aware anomaly detection with reproducible, open-source implementations. Future work targets bias mitigation, adaptive function selection, and ensemble strategies to further enhance robustness and interpretability in complex data landscapes.

Abstract

Anomaly Detection (AD) is evolving through algorithms capable of identifying outliers in complex datasets. The Isolation Forest (IF), a pivotal AD technique, exhibits adaptability limitations and biases. This paper introduces the Function-based Isolation Forest (FuBIF), a generalization of IF that enables the use of real-valued functions for dataset branching, significantly enhancing the flexibility of evaluation tree construction. Complementing this, the FuBIF Feature Importance (FuBIFFI) algorithm extends the interpretability in IF-based approaches by providing feature importance scores across possible FuBIF models. This paper details the operational framework of FuBIF, evaluates its performance against established methods, and explores its theoretical contributions. An open-source implementation is provided to encourage further research and ensure reproducibility.

Paper Structure

This paper contains 18 sections, 14 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Anomaly scoremaps generated by different IF-based models, where darker colors indicate more anomalous regions.
  • Figure 2: The images show anomaly scoremaps generated by Quadric($\lambda$)-IF models for $\lambda=1$ and $\lambda=100$. The right column displays the dataset translated left by 10 units with highlighted bias.
  • Figure 3: The images show imbalance projections from a branching function toward the mean direction of two subsets. The color map indicates function score, with the black line marking the threshold. Green and blue points represent the subsets and their derivative directions. The thick black arrow shows the mean derivative direction, its thickness reflecting the imbalance in each subset.
  • Figure 4: Heatmap of $AUC_{FS}$ scores for datasets in Scenario II with normal threshold distribution. Higher score indicates better feature selection.
  • Figure 5: Computational time of the models varying the size and dimensionality of the dataset.
  • ...and 1 more figures