MLXP: A Framework for Conducting Replicable Experiments in Python
Michael Arbel, Alexandre Zouaoui
TL;DR
MLXP addresses reproducibility challenges in ML by offering a lightweight, Hydra-based framework that streamlines experiment configuration, parallel launching, and results exploitation with minimal intrusion. It introduces a MLXP launcher, a multi-run submission utility (mlxpsub), automated unique log directories, and a Git-based code versioning mechanism to preserve exact run conditions on HPC clusters. The framework also provides a reader for filtering, loading, grouping, and aggregating results with lazy evaluation, enabling scalable post-hoc analysis. Demonstrations on HySUPP and optimization benchmarks illustrate how MLXP reduces friction in conducting replicable experiments and fosters robust scientific conclusions in data science research.
Abstract
Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp . MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.
