Table of Contents
Fetching ...

autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks

Simon Rampp, Andreas Triantafyllopoulos, Manuel Milling, Björn W. Schuller

TL;DR

autrainer is a PyTorch-based, audio-first toolkit designed to deliver rapid, reproducible, and extensible deep learning workflows for computer audition. It unifies data ingestion, feature extraction, augmentation, model training, postprocessing, and inference under Hydra-configured, low-code pipelines, with CLI and Python wrappers and a repository of pretrained models. The framework supports a broad set of tasks and models, emphasizes reproducibility and fair baselines, and provides tooling for experiment tracking and result aggregation. By offering off-the-shelf datasets, a growing model zoo, and an open-source community pathway, autrainer aims to democratisize and accelerate DL research in audio.

Abstract

This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks. autrainer is a PyTorch-based toolkit that allows for rapid, reproducible, and easily extensible training on a variety of different computer audition tasks. Concretely, autrainer offers low-code training and supports a wide range of neural networks as well as preprocessing routines. In this work, we present an overview of its inner workings and key capabilities.

autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks

TL;DR

autrainer is a PyTorch-based, audio-first toolkit designed to deliver rapid, reproducible, and extensible deep learning workflows for computer audition. It unifies data ingestion, feature extraction, augmentation, model training, postprocessing, and inference under Hydra-configured, low-code pipelines, with CLI and Python wrappers and a repository of pretrained models. The framework supports a broad set of tasks and models, emphasizes reproducibility and fair baselines, and provides tooling for experiment tracking and result aggregation. By offering off-the-shelf datasets, a growing model zoo, and an open-source community pathway, autrainer aims to democratisize and accelerate DL research in audio.

Abstract

This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks. autrainer is a PyTorch-based toolkit that allows for rapid, reproducible, and easily extensible training on a variety of different computer audition tasks. Concretely, autrainer offers low-code training and supports a wide range of neural networks as well as preprocessing routines. In this work, we present an overview of its inner workings and key capabilities.

Paper Structure

This paper contains 19 sections, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Schematic diagram of the autrainer workflow. The package can be installed via pip (or any other Python package manager of choice). Subsequently, the user has to specify datasets and models they want to train and a set of possible hyperparameters. autrainer fetch can be used to download datasets and model weights, while autrainer preprocess optionally performs offline feature extraction, and autrainer train conducts the training for each set of hyperparameters. Finally, autrainer postprocess can be used to summarise and aggregate results. The blue cards above the autrainer commands indicate the key functionality provided by autrainer while the grey cards below describe optional steps to extend or customise the functionality of the corresponding commands.