Table of Contents
Fetching ...

Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance

Sourya Dipta Das, Yash Vadi, Abhishek Unnam, Kuldeep Yadav

TL;DR

This work tackles open-set dialect identification by proposing an unsupervised OOD detection method that leverages Mahalanobis distance features derived from latent embeddings across all transformer layers of a wav2vec 2.0 dialect classifier. By computing layer-wise means and covariances from close-set training data and concatenating per-layer Mahalanobis scores, the approach forms a rich feature vector $V_{MD}(x)$ that feeds a KNN-based outlier detector to produce a class rejection score $G_D(x)$ with a threshold $\delta$. The method preserves strong close-set performance while significantly improving OOD detection, outperforming several state-of-the-art detectors on English and Spanish dialect datasets. The findings highlight the effectiveness of multi-layer representations for open-world dialect classification and suggest potential gains from adversarial and contrastive learning in future work.

Abstract

Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our proposed approach outperforms other state-of-the-art OOD detection methods significantly.

Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance

TL;DR

This work tackles open-set dialect identification by proposing an unsupervised OOD detection method that leverages Mahalanobis distance features derived from latent embeddings across all transformer layers of a wav2vec 2.0 dialect classifier. By computing layer-wise means and covariances from close-set training data and concatenating per-layer Mahalanobis scores, the approach forms a rich feature vector that feeds a KNN-based outlier detector to produce a class rejection score with a threshold . The method preserves strong close-set performance while significantly improving OOD detection, outperforming several state-of-the-art detectors on English and Spanish dialect datasets. The findings highlight the effectiveness of multi-layer representations for open-world dialect classification and suggest potential gains from adversarial and contrastive learning in future work.

Abstract

Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our proposed approach outperforms other state-of-the-art OOD detection methods significantly.
Paper Structure (13 sections, 3 equations, 2 figures, 6 tables)

This paper contains 13 sections, 3 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Illustration of Layer Feature Embedding Mean($\mu_k$), Covariance Matrix ($\Sigma_k$) Estimation for $k$-th transformer layer from the feature embeddings,$F^{k}_D(x)$ of training data, $D_{train}$. Here, Avg is component-wise vector average operation and MLCE is Maximum Likelihood Covariance Estimator.
  • Figure 2: OpenSet wav2vec 2.0 Dialect Classifier Architecture. Here, $\oplus$ is concatenation operator and $V_{MD}(x) = [V_{MD}^1(x)\oplus V_{MD}^2(x)\oplus\dots\oplus V_{MD}^K(x)]$, is Mahalanobis Distance Feature Vector.