"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding
Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis
TL;DR
This work introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding, spanning four datasets and languages with eight unlearning methods evaluated via a new Global Unlearning Metric (GUM) that jointly considers efficacy, efficiency, and utility. The study shows that Negative Gradients (NG) provides the most balanced performance across tasks and models, achieving large speedups while keeping memory of forgotten data low. The results underscore the importance of ground-truth (gold) model references for evaluating unlearning and reveal substantial cross-dataset and cross-language variability, motivating further research in privacy-preserving SLU. Overall, the benchmark establishes a foundation for rigorous, multi-faceted evaluation of unlearning techniques in speech tasks relevant to privacy-preserving voice assistants.
Abstract
Machine unlearning, the process of efficiently removing specific information from machine learning models, is a growing area of interest for responsible AI. However, few studies have explored the effectiveness of unlearning methods on complex tasks, particularly speech-related ones. This paper introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding (SLU), focusing on four datasets spanning four languages. We address the unlearning of data from specific speakers as a way to evaluate the quality of potential "right to be forgotten" requests. We assess eight unlearning techniques and propose a novel metric to simultaneously better capture their efficacy, utility, and efficiency. UnSLU-BENCH sets a foundation for unlearning in SLU and reveals significant differences in the effectiveness and computational feasibility of various techniques.
