Table of Contents
Fetching ...

Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases

Marc Boullé, Nicolas Voisine, Bruno Guerraz, Carine Hue, Felipe Olmos, Vladimir Popescu, Stéphane Gouache, Stéphane Bouget, Alexis Bondu, Luc Aurelien Gauthier, Yassine Nair Benrekia, Fabrice Clérot, Vincent Lemaire

TL;DR

Khiops targets AutoML for very large, multi-table relational data by adopting a hyperparameter-free MODL Bayesian formalism to drive automatic feature construction, data preparation, and parsimonious learning, with integrated XAI. It supports multi-table schemas, text features, and end-to-end deployment via a Python interface and visualization tools, while adapting to hardware resources for scalable performance. The approach demonstrates competitive accuracy with reduced feature sets and energy usage on datasets like Accident and UNSW-NB15, accompanied by Shapley-based explanations for local interpretability. The work highlights practical impact in domains such as security analytics and customer data, offering a scalable, transparent, and deployable AutoML solution for big relational data.

Abstract

Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.

Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases

TL;DR

Khiops targets AutoML for very large, multi-table relational data by adopting a hyperparameter-free MODL Bayesian formalism to drive automatic feature construction, data preparation, and parsimonious learning, with integrated XAI. It supports multi-table schemas, text features, and end-to-end deployment via a Python interface and visualization tools, while adapting to hardware resources for scalable performance. The approach demonstrates competitive accuracy with reduced feature sets and energy usage on datasets like Accident and UNSW-NB15, accompanied by Shapley-based explanations for local interpretability. The work highlights practical impact in domains such as security analytics and customer data, offering a scalable, transparent, and deployable AutoML solution for big relational data.

Abstract

Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.

Paper Structure

This paper contains 23 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Machine learning process implemented by Khiops
  • Figure 2: Specification of the multi-table dataset
  • Figure 3: Learning and deploying on the Accidents database
  • Figure 4: Screenshot of Khiops Visualisation after analysing the accident database and constructing 100 aggregates
  • Figure 5: Variable's Importance for the 3 classifieurs
  • ...and 1 more figures