Table of Contents
Fetching ...

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

TL;DR

Multi-task SLU models are powerful but large and prone to catastrophic forgetting when adapting to new task data. The authors identify task-specific subnetworks within a shared UniverSLU model via iterative global magnitude pruning, representing each task by $m_t$-masked parameters and updating only the masked subset. Pruned subnetworks achieve competitive or improved ER and IC performance with substantial parameter reductions ($36\%$ or $67\%$ sparsity) and exhibit mitigated forgetting in continual learning, though ASR can be sensitive to high sparsity. The approach enables parameter-efficient deployment, supports continual learning, and scales to additional tasks, revealing task overlap patterns across subnetworks.

Abstract

Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specific subnetworks within a multi-task SLU model via neural network pruning. In addition to model compression, we expect that the forgetting of previously trained tasks can be mitigated by updating only a task-specific subnetwork. We conduct experiments on top of the state-of-the-art multi-task SLU model ``UniverSLU'', trained for several tasks such as emotion recognition (ER), intent classification (IC), and automatic speech recognition (ASR). We show that pruned models were successful in adapting to additional ASR or IC data with minimal performance degradation on previously trained tasks.

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

TL;DR

Multi-task SLU models are powerful but large and prone to catastrophic forgetting when adapting to new task data. The authors identify task-specific subnetworks within a shared UniverSLU model via iterative global magnitude pruning, representing each task by -masked parameters and updating only the masked subset. Pruned subnetworks achieve competitive or improved ER and IC performance with substantial parameter reductions ( or sparsity) and exhibit mitigated forgetting in continual learning, though ASR can be sensitive to high sparsity. The approach enables parameter-efficient deployment, supports continual learning, and scales to additional tasks, revealing task overlap patterns across subnetworks.

Abstract

Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specific subnetworks within a multi-task SLU model via neural network pruning. In addition to model compression, we expect that the forgetting of previously trained tasks can be mitigated by updating only a task-specific subnetwork. We conduct experiments on top of the state-of-the-art multi-task SLU model ``UniverSLU'', trained for several tasks such as emotion recognition (ER), intent classification (IC), and automatic speech recognition (ASR). We show that pruned models were successful in adapting to additional ASR or IC data with minimal performance degradation on previously trained tasks.
Paper Structure (9 sections, 5 figures, 2 tables, 2 algorithms)

This paper contains 9 sections, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of task-specific subnetworks in multi-task SLU model. To solve SER task, only subnetwork represented as green pathways is activated.
  • Figure 2: Continual learning on (a) LibriSpeech ASR and (b) SLURP IC. We additionally trained models on LibriSpeech $360$h or SLURP real+synthetic data.
  • Figure 3: Parameter overlap ratio between tasks.
  • Figure : Identify pruning mask
  • Figure : Update parameters