Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

Manuel Milling; Andreas Triantafyllopoulos; Iosif Tsangko; Simon David Noel Rampp; Björn Wolfgang Schuller

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller

TL;DR

This work investigates whether minima sharpness, assessed via filter-normalised two-dimensional loss landscapes and the $\epsilon$-sharpness measure $s_\epsilon$, predicts generalisation in acoustic scene classification (ASC) on the DCASE2020 dataset. By training CNN10 and CNN14 models with diverse optimisers and performing a grid-search over hyperparameters, the authors quantify how sharpness correlates with in-distribution and out-of-distribution generalisation. Contrary to prevalent findings in computer vision, sharper minima here tend to generalise better, especially to unseen devices, highlighting the influence of optimisers on sharpness and the limitations of cross-domain comparability. The results motivate more efficient and interpretable sharpness measures and deeper analyses of hyperparameter effects to improve ASC robustness, with code and model states released publicly.

Abstract

The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the acoustic scene classification task of the DCASE2020 challenge data. Our analysis is based on two-dimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

TL;DR

This work investigates whether minima sharpness, assessed via filter-normalised two-dimensional loss landscapes and the

-sharpness measure

, predicts generalisation in acoustic scene classification (ASC) on the DCASE2020 dataset. By training CNN10 and CNN14 models with diverse optimisers and performing a grid-search over hyperparameters, the authors quantify how sharpness correlates with in-distribution and out-of-distribution generalisation. Contrary to prevalent findings in computer vision, sharper minima here tend to generalise better, especially to unseen devices, highlighting the influence of optimisers on sharpness and the limitations of cross-domain comparability. The results motivate more efficient and interpretable sharpness measures and deeper analyses of hyperparameter effects to improve ASC robustness, with code and model states released publicly.

Abstract

Paper Structure (13 sections, 3 equations, 4 figures, 1 table)

This paper contains 13 sections, 3 equations, 4 figures, 1 table.

Introduction
Methodology
Filter-Normalisation
Sharpness
Experiments and Discussion
Dataset
Model training
On the robustness towards random directions
On the impact of sharpness on generalisation
On the impact of hyperparameters on sharpness
Limitations
Conclusions
Acknowledgements

Figures (4)

Figure 1: Visualisation of the two-dimensional filter-normalised loss landscape for two different model states with different architectures and training paradigms.
Figure 2: Distribution of sharpness-measures. Each bar indicates the mean sharpness value with the standard deviation of a trained model state in three two-dimensional plots with different random directions.
Figure 3: Correlation plot between sharpness of minima (the higher, the sharper) and test accuracy for all trained models. Showing best-fit line and 95% confidence intervals for different models.
Figure 4: Disaggregated distribution of mean sharpness and accuracy across hyperparameters. Each bar averages the mean sharpness or accuracy of all trained models states, grouped by the different types of hyperparameters.

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

TL;DR

Abstract

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)