ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

Bachana Anasashvili; Vahidin Jeleskovic

ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

Bachana Anasashvili, Vahidin Jeleskovic

TL;DR

ALICE proposes a simple, automated framework that fuses backward feature elimination with inter-rater agreeability to shed light on black-box ML models. By evaluating two competing models across iteratively reduced feature sets and measuring agreement between their top predictions, ALICE reveals trade-offs between predictive performance and interpretability. The approach is demonstrated on the Telco churn task, showing that high agreement can accompany strong performance (e.g., between an MLP and Logistic Regression) but that best agreement does not always coincide with best $F1$ scores, underscoring the value of dual-criteria analysis for model selection. The work offers practical tooling for robustness checks and interpretable comparisons, with clear paths for extension to regression, additional feature-selection strategies, and broader library support.

Abstract

This paper presents a new Python library called Automated Learning for Insightful Comparison and Evaluation (ALICE), which merges conventional feature selection and the concept of inter-rater agreeability in a simple, user-friendly manner to seek insights into black box Machine Learning models. The framework is proposed following an overview of the key concepts of interpretability in ML. The entire architecture and intuition of the main methods of the framework are also thoroughly discussed and results from initial experiments on a customer churn predictive modeling task are presented, alongside ideas for possible avenues to explore for the future. The full source code for the framework and the experiment notebooks can be found at: https://github.com/anasashb/aliceHU

ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

TL;DR

scores, underscoring the value of dual-criteria analysis for model selection. The work offers practical tooling for robustness checks and interpretable comparisons, with clear paths for extension to regression, additional feature-selection strategies, and broader library support.

Abstract

Paper Structure (26 sections, 4 equations, 6 figures, 1 algorithm)

This paper contains 26 sections, 4 equations, 6 figures, 1 algorithm.

Introduction
Background and Related Work
Causal ML
Explainable AI
Inter-Rater Agreeability
Feature Selection
Proposed Framework
General Overview
Main Modules
Agreeability
Metrics
Testing
Utils
search_and_compare
Experiment Setting
...and 11 more sections

Figures (6)

Figure 1: The entire Automated Learning for Insightful Comparison and Evaluation (ALICE) framework.
Figure 2: Deep Feed-Forward Network obtained via Differential Evolution-based hyperparameter optimization.
Figure 3: Best and mean agreeabilities between models across the three experiments.
Figure 4: Best and mean $F1$ (right-hand-side $y$-axis), $\kappa$ (left-hand-side $y$-axis) scores from the MLP vs. Logit trial.
Figure 5: Best and mean $F1$ (right-hand-side $y$-axis), $\kappa$ (left-hand-side $y$-axis) scores from the MLP vs. RFC trial.
...and 1 more figures

ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

TL;DR

Abstract

ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

Authors

TL;DR

Abstract

Table of Contents

Figures (6)