Table of Contents
Fetching ...

Out-of-distribution generalisation in spoken language understanding

Dejan Porjazovski, Anssi Moisio, Mikko Kurimo

TL;DR

A modified version of the popular SLU dataset SLURP is introduced, featuring data splits for testing OOD generalisation in the SLU task, and it is found end-to-end SLU models to have limited capacity for generalisation.

Abstract

Test data is said to be out-of-distribution (OOD) when it unexpectedly differs from the training data, a common challenge in real-world use cases of machine learning. Although OOD generalisation has gained interest in recent years, few works have focused on OOD generalisation in spoken language understanding (SLU) tasks. To facilitate research on this topic, we introduce a modified version of the popular SLU dataset SLURP, featuring data splits for testing OOD generalisation in the SLU task. We call our modified dataset SLURP For OOD generalisation, or SLURPFOOD. Utilising our OOD data splits, we find end-to-end SLU models to have limited capacity for generalisation. Furthermore, by employing model interpretability techniques, we shed light on the factors contributing to the generalisation difficulties of the models. To improve the generalisation, we experiment with two techniques, which improve the results on some, but not all the splits, emphasising the need for new techniques.

Out-of-distribution generalisation in spoken language understanding

TL;DR

A modified version of the popular SLU dataset SLURP is introduced, featuring data splits for testing OOD generalisation in the SLU task, and it is found end-to-end SLU models to have limited capacity for generalisation.

Abstract

Test data is said to be out-of-distribution (OOD) when it unexpectedly differs from the training data, a common challenge in real-world use cases of machine learning. Although OOD generalisation has gained interest in recent years, few works have focused on OOD generalisation in spoken language understanding (SLU) tasks. To facilitate research on this topic, we introduce a modified version of the popular SLU dataset SLURP, featuring data splits for testing OOD generalisation in the SLU task. We call our modified dataset SLURP For OOD generalisation, or SLURPFOOD. Utilising our OOD data splits, we find end-to-end SLU models to have limited capacity for generalisation. Furthermore, by employing model interpretability techniques, we shed light on the factors contributing to the generalisation difficulties of the models. To improve the generalisation, we experiment with two techniques, which improve the results on some, but not all the splits, emphasising the need for new techniques.
Paper Structure (10 sections, 1 figure, 2 tables)

This paper contains 10 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Top 3 words that are most frequently the most important word for the prediction (colour intensity signifies frequency), determined by the IG method, for the OOV and CG splits. The first two rows in each matrix show the most important words for the samples that were correctly predicted by the IID and OOD splits, respectively. The third row shows the most important words for the incorrectly classified samples. The last row shows the most commonly confused classes on the OOD split with respect to the true class.