Table of Contents
Fetching ...

YAMLE: Yet Another Machine Learning Environment

Martin Ferianc, Miguel Rodrigues

TL;DR

The paper addresses reproducibility and boilerplate bottlenecks in ML research by introducing YAMLE, an open-source, modular environment designed for rapid prototyping and end-to-end experimentation. It presents a three-component architecture—BaseDataModule, BaseModel, and BaseMethod—meditated by a trainer/tester, with integrated logging (TensorBoard, torchmetrics) and hyperparameter optimization (syne-tune), and a command-line interface for configuration. The work contributes a configurable data module, reusable model abstractions, and a method layer for new training algorithms, plus a path toward a shared repository of implementations. This framework aims to lower barriers to reproducible discoveries by enabling straightforward comparisons across models and datasets within a unified ecosystem.

Abstract

YAMLE: Yet Another Machine Learning Environment is an open-source framework that facilitates rapid prototyping and experimentation with machine learning (ML) models and methods. The key motivation is to reduce repetitive work when implementing new approaches and improve reproducibility in ML research. YAMLE includes a command-line interface and integrations with popular and well-maintained PyTorch-based libraries to streamline training, hyperparameter optimisation, and logging. The ambition for YAMLE is to grow into a shared ecosystem where researchers and practitioners can quickly build on and compare existing implementations. Find it at: https://github.com/martinferianc/yamle.

YAMLE: Yet Another Machine Learning Environment

TL;DR

The paper addresses reproducibility and boilerplate bottlenecks in ML research by introducing YAMLE, an open-source, modular environment designed for rapid prototyping and end-to-end experimentation. It presents a three-component architecture—BaseDataModule, BaseModel, and BaseMethod—meditated by a trainer/tester, with integrated logging (TensorBoard, torchmetrics) and hyperparameter optimization (syne-tune), and a command-line interface for configuration. The work contributes a configurable data module, reusable model abstractions, and a method layer for new training algorithms, plus a path toward a shared repository of implementations. This framework aims to lower barriers to reproducible discoveries by enabling straightforward comparisons across models and datasets within a unified ecosystem.

Abstract

YAMLE: Yet Another Machine Learning Environment is an open-source framework that facilitates rapid prototyping and experimentation with machine learning (ML) models and methods. The key motivation is to reduce repetitive work when implementing new approaches and improve reproducibility in ML research. YAMLE includes a command-line interface and integrations with popular and well-maintained PyTorch-based libraries to streamline training, hyperparameter optimisation, and logging. The ambition for YAMLE is to grow into a shared ecosystem where researchers and practitioners can quickly build on and compare existing implementations. Find it at: https://github.com/martinferianc/yamle.
Paper Structure (13 sections, 1 figure)

This paper contains 13 sections, 1 figure.

Figures (1)

  • Figure 1: The overview of the environment's design. It consists of three main components - BaseDataModule, BaseModel and BaseMethod managed by BaseTrainer/BaseTester. The BaseDataModule is responsible for downloading, loading and preprocessing data. The BaseModel defines the model's architecture. If necessary, the BaseMethod changes the model and defines the training, validation and test steps. The BaseTrainer/Tester groups the BaseDataModule, BaseModel and BaseMethod together with Logging. The whole training and testing can be overseen by Hyperparameter Optimisation. Additional components which actively change the training and can be defined by the user are Regularisation, Quantisation and Pruning.