Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases
Marc Boullé, Nicolas Voisine, Bruno Guerraz, Carine Hue, Felipe Olmos, Vladimir Popescu, Stéphane Gouache, Stéphane Bouget, Alexis Bondu, Luc Aurelien Gauthier, Yassine Nair Benrekia, Fabrice Clérot, Vincent Lemaire
TL;DR
Khiops targets AutoML for very large, multi-table relational data by adopting a hyperparameter-free MODL Bayesian formalism to drive automatic feature construction, data preparation, and parsimonious learning, with integrated XAI. It supports multi-table schemas, text features, and end-to-end deployment via a Python interface and visualization tools, while adapting to hardware resources for scalable performance. The approach demonstrates competitive accuracy with reduced feature sets and energy usage on datasets like Accident and UNSW-NB15, accompanied by Shapley-based explanations for local interpretability. The work highlights practical impact in domains such as security analytics and customer data, offering a scalable, transparent, and deployable AutoML solution for big relational data.
Abstract
Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.
