Table of Contents
Fetching ...

On the reliability of feature attribution methods for speech classification

Gaofei Shen, Hosein Mohebbi, Arianna Bisazza, Afra Alishahi, Grzegorz Chrupała

TL;DR

This paper addresses the reliability of feature attribution methods when explaining speech classification models. It systematically evaluates four attribution methods across input types (waveform, spectrogram, CNN embeddings) and aggregation/perturbation granularities using the inter-seed agreement ISA metric on three speech tasks (Gender ID, Speaker ID, and Intent with Action/Object/Location). The main result is that most standard attribution methods are unreliable in speech, with Integrated Gradients offering the best but typically modest reliability, and word-aligned perturbations on word-based tasks yielding the strongest reliability. The findings highlight the need for speech-specific attribution methods and careful consideration of task structure and representation when interpreting speech models.

Abstract

As the capabilities of large-scale pre-trained models evolve, understanding the determinants of their outputs becomes more important. Feature attribution aims to reveal which parts of the input elements contribute the most to model outputs. In speech processing, the unique characteristics of the input signal make the application of feature attribution methods challenging. We study how factors such as input type and aggregation and perturbation timespan impact the reliability of standard feature attribution methods, and how these factors interact with characteristics of each classification task. We find that standard approaches to feature attribution are generally unreliable when applied to the speech domain, with the exception of word-aligned perturbation methods when applied to word-based classification tasks.

On the reliability of feature attribution methods for speech classification

TL;DR

This paper addresses the reliability of feature attribution methods when explaining speech classification models. It systematically evaluates four attribution methods across input types (waveform, spectrogram, CNN embeddings) and aggregation/perturbation granularities using the inter-seed agreement ISA metric on three speech tasks (Gender ID, Speaker ID, and Intent with Action/Object/Location). The main result is that most standard attribution methods are unreliable in speech, with Integrated Gradients offering the best but typically modest reliability, and word-aligned perturbations on word-based tasks yielding the strongest reliability. The findings highlight the need for speech-specific attribution methods and careful consideration of task structure and representation when interpreting speech models.

Abstract

As the capabilities of large-scale pre-trained models evolve, understanding the determinants of their outputs becomes more important. Feature attribution aims to reveal which parts of the input elements contribute the most to model outputs. In speech processing, the unique characteristics of the input signal make the application of feature attribution methods challenging. We study how factors such as input type and aggregation and perturbation timespan impact the reliability of standard feature attribution methods, and how these factors interact with characteristics of each classification task. We find that standard approaches to feature attribution are generally unreliable when applied to the speech domain, with the exception of word-aligned perturbation methods when applied to word-based classification tasks.

Paper Structure

This paper contains 18 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Distributions of ISA scores without aggregation. The rows indicate different input feature types, the columns are different tasks. Within each panel, each boxplot shows results from different attribution methods and the y-axis is the ISA score. The red dotted line indicates the randomly shuffled baseline. IG: Integrated Gradients, FA: Feature Ablation.
  • Figure 2: Distributions of ISA scores for the CNN embedding input type, at different levels of aggregation. The rows are levels of granularity of aggregation, the columns are different tasks. Within each panel, each boxplot shows results from different attribution methods and the y-axis is the ISA score. The red dotted line indicates the randomly shuffled baseline. IG: Integrated Gradients, FA: Feature Ablation.
  • Figure 3: Distributions of ISA scores with perturbation operating directly on word-aligned segments. The rows indicate different input feature types, the columns are different tasks. Within each panel, each boxplot report results from different attribution methods and the y-axis is the ISA score. The red dotted line indicates the randomly shuffled baseline. FA: Feature Ablation.