autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp, Andreas Triantafyllopoulos, Manuel Milling, Björn W. Schuller
TL;DR
autrainer is a PyTorch-based, audio-first toolkit designed to deliver rapid, reproducible, and extensible deep learning workflows for computer audition. It unifies data ingestion, feature extraction, augmentation, model training, postprocessing, and inference under Hydra-configured, low-code pipelines, with CLI and Python wrappers and a repository of pretrained models. The framework supports a broad set of tasks and models, emphasizes reproducibility and fair baselines, and provides tooling for experiment tracking and result aggregation. By offering off-the-shelf datasets, a growing model zoo, and an open-source community pathway, autrainer aims to democratisize and accelerate DL research in audio.
Abstract
This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks. autrainer is a PyTorch-based toolkit that allows for rapid, reproducible, and easily extensible training on a variety of different computer audition tasks. Concretely, autrainer offers low-code training and supports a wide range of neural networks as well as preprocessing routines. In this work, we present an overview of its inner workings and key capabilities.
