Table of Contents
Fetching ...

Are Word Embedding Methods Stable and Should We Care About It?

Angana Borah, Manash Pratim Barman, Amit Awekar

TL;DR

Problem: Word Embedding Methods (WEMs) show run-to-run instability that can affect downstream NLP tasks. Approach: The paper compares Word2Vec, GloVe, and fastText on four real-world corpora using a KNN-overlap intrinsic stability metric defined as stability(w) = |KNN_{E1}(w) ∩ KNN_{E2}(w)| / k and stability(A) = average_w stability(w), evaluating five training parameters. Contributions: cross-dataset stability assessment for three popular WEMs, detailed analysis of parameter effects (k, dimensions, epochs, window size, frequency), and links between stability and clustering, POS tagging, and WEAT fairness outcomes. Findings: fastText is consistently most stable, Word2Vec least stable, and stability generally improves with larger dimensions up to ~300 before plateauing; some instability can even boost downstream POS tagging performance. Significance: guides practitioners in selecting WEMs and hyperparameters to obtain robust word representations and reliable downstream performance.

Abstract

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on three downstream tasks: Clustering, POS tagging, and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.

Are Word Embedding Methods Stable and Should We Care About It?

TL;DR

Problem: Word Embedding Methods (WEMs) show run-to-run instability that can affect downstream NLP tasks. Approach: The paper compares Word2Vec, GloVe, and fastText on four real-world corpora using a KNN-overlap intrinsic stability metric defined as stability(w) = |KNN_{E1}(w) ∩ KNN_{E2}(w)| / k and stability(A) = average_w stability(w), evaluating five training parameters. Contributions: cross-dataset stability assessment for three popular WEMs, detailed analysis of parameter effects (k, dimensions, epochs, window size, frequency), and links between stability and clustering, POS tagging, and WEAT fairness outcomes. Findings: fastText is consistently most stable, Word2Vec least stable, and stability generally improves with larger dimensions up to ~300 before plateauing; some instability can even boost downstream POS tagging performance. Significance: guides practitioners in selecting WEMs and hyperparameters to obtain robust word representations and reliable downstream performance.

Abstract

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on three downstream tasks: Clustering, POS tagging, and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.

Paper Structure

This paper contains 17 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: WEM stability plateaus for higher values of $k$
  • Figure 2: All word groups have high variance in word stability for fastText
  • Figure 3: All word groups have high variance in word stability for GloVe
  • Figure 4: All word groups have high variance in word stability for Word2Vec.
  • Figure 5: For a set of randomly sampled words, stability of words with each WEM
  • ...and 7 more figures