Table of Contents
Fetching ...

MLXP: A Framework for Conducting Replicable Experiments in Python

Michael Arbel, Alexandre Zouaoui

TL;DR

MLXP addresses reproducibility challenges in ML by offering a lightweight, Hydra-based framework that streamlines experiment configuration, parallel launching, and results exploitation with minimal intrusion. It introduces a MLXP launcher, a multi-run submission utility (mlxpsub), automated unique log directories, and a Git-based code versioning mechanism to preserve exact run conditions on HPC clusters. The framework also provides a reader for filtering, loading, grouping, and aggregating results with lazy evaluation, enabling scalable post-hoc analysis. Demonstrations on HySUPP and optimization benchmarks illustrate how MLXP reduces friction in conducting replicable experiments and fosters robust scientific conclusions in data science research.

Abstract

Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp . MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.

MLXP: A Framework for Conducting Replicable Experiments in Python

TL;DR

MLXP addresses reproducibility challenges in ML by offering a lightweight, Hydra-based framework that streamlines experiment configuration, parallel launching, and results exploitation with minimal intrusion. It introduces a MLXP launcher, a multi-run submission utility (mlxpsub), automated unique log directories, and a Git-based code versioning mechanism to preserve exact run conditions on HPC clusters. The framework also provides a reader for filtering, loading, grouping, and aggregating results with lazy evaluation, enabling scalable post-hoc analysis. Demonstrations on HySUPP and optimization benchmarks illustrate how MLXP reduces friction in conducting replicable experiments and fosters robust scientific conclusions in data science research.

Abstract

Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp . MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.
Paper Structure (34 sections, 9 figures)

This paper contains 34 sections, 9 figures.

Figures (9)

  • Figure 1: Configuring experiments using Hydra.
  • Figure 2: Configuring experiments without Hydra.
  • Figure 3: Project directory structure
  • Figure 4: Log directory structure
  • Figure 5: Execution of non-versioned jobs
  • ...and 4 more figures