Table of Contents
Fetching ...

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

David A. Kelly, Hana Chockler

Abstract

In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model $f$, transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on $f$. We define what it means for a sufficient signal to be transferable and perform a large study over $3$ different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable $\approx26\%$ of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

Abstract

In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model , transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on . We define what it means for a sufficient signal to be transferable and perform a large study over different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.

Paper Structure

This paper contains 18 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The partitioning of 'disgust' (a) into sufficient signal (b), sufficient and necessary (complete) signal (c) and $1$-complete signal (d). A $1$-complete signal is sufficient and necessary for 'disgust' and also has the same confidence as the original signal (a). We investigate how transferable these signals and classifications are across different models.
  • Figure 2: Graphical representation of an audio depth-$2$ causal model with $n$ frequencies as input, and classification as outcome. The internal node $f_i$ introduces the network $f$ as a black-box.
  • Figure 3: Power spectral density of real and fake data on ASVSpoof2019 (top $2$ rows) and ITW (bottom two rows). The different between SP$_1$ and the other models in clear.
  • Figure 4: Spectral entropy across real and fake sufficiencies on ASVSpoof2019 (top) and ITW (bottom) and $3$ models.

Theorems & Definitions (5)

  • Definition 1: Sufficient Explanation
  • Definition 2: Sufficient Subset
  • Definition 3: $\delta$-Complete Subset
  • Definition 4: Transferability
  • Definition 5: Partial Transferability