Function Based Isolation Forest (FuBIF): A Unifying Framework for Interpretable Isolation-Based Anomaly Detection
Alessio Arcudi, Alessandro Ferreri, Francesco Borsatti, Gian Antonio Susto
TL;DR
FuBIF introduces a unifying, function-based framework for isolation-based anomaly detection, enabling flexible, interpretable space partitioning via splitting functions $f\in\mathcal{F}$ with thresholds drawn from $\mu$ and measured by path-length-based anomaly scores. Its companion FuBIFFI provides model-agnostic feature attributions by tracing per-node contributions through forest paths and aggregating into a global $GFI$ score that contrasts inliers and outliers. The approach generalizes classic IF variants (e.g., axis-aligned, linear, radial, NN-based) as special cases $(\mathcal{F},\rho,\mu)$, and demonstrates competitive performance across real and synthetic AD datasets, while offering interpretable explanations. While results are dataset-dependent and reveal biases inherent to certain geometries, FuBIF lays a foundation for adaptive, ensemble, and bias-aware anomaly detection with reproducible, open-source implementations. Future work targets bias mitigation, adaptive function selection, and ensemble strategies to further enhance robustness and interpretability in complex data landscapes.
Abstract
Anomaly Detection (AD) is evolving through algorithms capable of identifying outliers in complex datasets. The Isolation Forest (IF), a pivotal AD technique, exhibits adaptability limitations and biases. This paper introduces the Function-based Isolation Forest (FuBIF), a generalization of IF that enables the use of real-valued functions for dataset branching, significantly enhancing the flexibility of evaluation tree construction. Complementing this, the FuBIF Feature Importance (FuBIFFI) algorithm extends the interpretability in IF-based approaches by providing feature importance scores across possible FuBIF models. This paper details the operational framework of FuBIF, evaluates its performance against established methods, and explores its theoretical contributions. An open-source implementation is provided to encourage further research and ensure reproducibility.
