Table of Contents
Fetching ...

"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding

Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis

TL;DR

This work introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding, spanning four datasets and languages with eight unlearning methods evaluated via a new Global Unlearning Metric (GUM) that jointly considers efficacy, efficiency, and utility. The study shows that Negative Gradients (NG) provides the most balanced performance across tasks and models, achieving large speedups while keeping memory of forgotten data low. The results underscore the importance of ground-truth (gold) model references for evaluating unlearning and reveal substantial cross-dataset and cross-language variability, motivating further research in privacy-preserving SLU. Overall, the benchmark establishes a foundation for rigorous, multi-faceted evaluation of unlearning techniques in speech tasks relevant to privacy-preserving voice assistants.

Abstract

Machine unlearning, the process of efficiently removing specific information from machine learning models, is a growing area of interest for responsible AI. However, few studies have explored the effectiveness of unlearning methods on complex tasks, particularly speech-related ones. This paper introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding (SLU), focusing on four datasets spanning four languages. We address the unlearning of data from specific speakers as a way to evaluate the quality of potential "right to be forgotten" requests. We assess eight unlearning techniques and propose a novel metric to simultaneously better capture their efficacy, utility, and efficiency. UnSLU-BENCH sets a foundation for unlearning in SLU and reveals significant differences in the effectiveness and computational feasibility of various techniques.

"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding

TL;DR

This work introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding, spanning four datasets and languages with eight unlearning methods evaluated via a new Global Unlearning Metric (GUM) that jointly considers efficacy, efficiency, and utility. The study shows that Negative Gradients (NG) provides the most balanced performance across tasks and models, achieving large speedups while keeping memory of forgotten data low. The results underscore the importance of ground-truth (gold) model references for evaluating unlearning and reveal substantial cross-dataset and cross-language variability, motivating further research in privacy-preserving SLU. Overall, the benchmark establishes a foundation for rigorous, multi-faceted evaluation of unlearning techniques in speech tasks relevant to privacy-preserving voice assistants.

Abstract

Machine unlearning, the process of efficiently removing specific information from machine learning models, is a growing area of interest for responsible AI. However, few studies have explored the effectiveness of unlearning methods on complex tasks, particularly speech-related ones. This paper introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding (SLU), focusing on four datasets spanning four languages. We address the unlearning of data from specific speakers as a way to evaluate the quality of potential "right to be forgotten" requests. We assess eight unlearning techniques and propose a novel metric to simultaneously better capture their efficacy, utility, and efficiency. UnSLU-BENCH sets a foundation for unlearning in SLU and reveals significant differences in the effectiveness and computational feasibility of various techniques.

Paper Structure

This paper contains 10 sections, 3 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Trade-off between utility (test and forget F1) and efficacy (MIA) on NG, as the LR changes (ITALIC, XLS-R 53-IT).