eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous Ensembles
Jamie J. R. Bennett, Aviad Susman, Yan Chak Li, Gaurav Pandey
TL;DR
This paper presents eipy, an open-source Python package for multi-modal data integration based on Ensemble Integration (EI). The data model is $\mathcal{X} = \{ \mathbf{X}_i \in \mathbb{R}^{n \times f_i} \mid i=1, \dots, m \}$, enabling modality-specific base predictors to be trained and ensembled via nested cross-validation for robust evaluation. Key contributions include a scikit-learn-like API with functions such as fit_base, fit_ensemble, and predict, plus a PermutationInterpreter for cross-modal interpretation, and a ready-to-use NHANES youth diabetes dataset via load_diabetes. The package is distributed on PyPI, has extensive documentation and tutorials, and aims to make rigorous multi-modal integration accessible to non-specialists and researchers across domains.
Abstract
In this paper, we introduce eipy--an open-source Python package for developing effective, multi-modal heterogeneous ensembles for classification. eipy simultaneously provides both a rigorous, and user-friendly framework for comparing and selecting the best-performing multi-modal data integration and predictive modeling methods by systematically evaluating their performance using nested cross-validation. The package is designed to leverage scikit-learn-like estimators as components to build multi-modal predictive models. An up-to-date user guide, including API reference and tutorials, for eipy is maintained at https://eipy.readthedocs.io . The main repository for this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy .
